The backends in microservices architecture, rather than communicating with each other and sending the responses, they just put the changes happened in the service into an Event Bus and broadcast other services to look at the change and do the necessary changes into them if it concerns them or just ignore. And the services which concern with the change will do the change and can even push it into an Event Bus thus making it an chain of events kind of a thing

  • Rather than directly sending the events we use event bus which acts like an Publisher Subscriber Model, and the event bus can be any queuing system, thus will make the data safe(lossless) as even though it’s down, will replay the messages once backup

Q Suppose we’re playing Counter Strike game and you have shot Person 1 at time t, now the server to validate if it’s a head shot or not ?
A Using REST APIs/gRPC - we will send your location, location of Person 1, values of the bullet to the server and the server will validate the request but until the request process and server starts validating the Person 1 would have moved from that location and we can’t actually determine whether it’s a head shot or not
Now using Event Driven Architecture - we will replay the events back to time t and validate whether it’s a head shot or not

  • In Event Driven Architecture we store the data of the corresponding service and the data coming from the other services (The services on change of which the current service concerns). This makes the Event Driven architecture highly available such that if one service is down other services have the storage persistent but on the other hand Consistency is not good as we need to maintain same state for all databases across all services
  • But after updating the database of the service the events are also stored in an Event Log
    • Event logs are centralized, append-only storage systems that run as separate infrastructure components (like Apache Kafka clusters or EventStoreDB these are used like Databases not like Queues) outside individual services
    • They store immutable events in ordered sequences with configurable retention periods and distributed replication for durability
    • services write events to these logs after updating their local databases, enabling cross-service event distribution, historical replay, and serving as the source of truth for what happened in the system
    • unlike service databases that store current mutable state, event logs preserve the complete history of events with stronger persistence guarantees through disk-based storage and fault-tolerant replication across multiple nodes
  • The Event Driven Architecture will act like a time machine for our services as we can easily replay the events back
  • It also makes the updates easy in the codebase, as you just need to replicate the data from old service to new service and need to send the updates to it. But only for some services it will not work when a service interacts with time-dependent external systems (e.g., sending emails), replaying historic events during an upgrade can yield different responses, causing unintended behavior changes on replacement

If you want to go to a particular point in time via events, there are three ways

  1. Replay - Replay the events from the start until the desired timestamp - really gets hard when there are a lot of events
  2. Storing Diff - Rather than storing all the events, take the first event and just store the difference between the events
  3. Undo - Start from back and undo the events, looks lucrative and a good approach but we can’t undo certain events, like sending an email as it’s dependent on email not on your application or database

Note: We can use the Replay and Storing Diff Approach if we squash the events every day, then the looping input size will be small
The Replaying and Re-Constructing of the events to a particular state of time is called Hydration

Q But performing Hydration every time for collecting the data is an expensive task how to handle that ?
A We can Cache the Events and store in a state storage like Postgres and get from it when needed, but if the user asks for verification then we can Re-Consile/Re-Compute the whole events

Achieving Consistency in Event Driven Architecture

Q Consider a bank system built on Event-Driven Architecture, now Person A have 950 to Person B, and the bank will charge a 1000 (the full amount they have). The commission service successfully processes first and deducts 950 for the actual transfer) crashes and becomes unavailable, leaving Person A’s account having 0. Now what if Person A, makes a new transfer request to send 950, calculates that this covers the required 800 + $50 commission), and approves the second transaction, here there is no consistency between databases of particular services. how can we achieve the consistency between services ?
A Implement a Saga orchestration that treats the entire transfer — debiting the sender, deducting commission, and crediting the recipient — as a coordinated workflow with compensating actions on failure. Use a transactional outbox in each service and delay balance updates in the read model until the saga reaches a “completed” state

  • Model the Transfer as a Saga
    • Step 1: Debit Account A if Step 1 fails, abort saga
    • Step 2: Deduct Commission If Step 2 fails after Step 1 succeeded, run CreditAccount(A, 1000) (undo debit)
    • Step 3: Credit Account B If Step 3 fails, run DebitAccount(CommissionAccount, 50) and CreditAccount(A, 1000) in reverse order
  • Orchestrator Coordination
    • Implement a Saga orchestrator component that Receives a TransferRequested(A → B, 1000, commission=50) event. Issues each command in sequence and awaits success/failure callbacks. Tracks state in a durable Saga log (e.g., in a database table) for crash recovery. Retries failed steps and invokes compensations if retries exhaust or downstream services remain unavailable. This central coordinator prevents premature balance updates and guarantees that a transfer is either fully applied or fully rolled back.
  • Transactional Outbox for Reliable Event Publication
    • Each microservice (Debit, Commission, Credit) should Perform its local database update and write an “outbox” entry in the same transaction. A separate poller reliably publishes outbox events to the message broker after commit, ensuring no lost or duplicated events. This pattern solves the dual-write problem and preserves exactly-once semantics across service boundaries.
  • Idempotency and Retries
    • Ensure each step’s API is idempotent by including a unique SagaID and CommandID so that repeated retries do not cause duplicate debits or credits.

Solving Competing Consumers problem in Event Driven Architecture

Q Considering an video processing system built on Event Driven Architecture, how will you ensure that a video is only processed once by the multiple workers exist or more importantly to ensuring each video is processed exactly once across multiple workers ?
A Combine the Competing Consumers pattern with message visibility timeouts (or consumer acknowledgments), partitioning/grouping by video ID, and idempotent processing to guarantee that each video is processed by exactly one worker, even in the face of failures.

  • Competing Consumers with Acknowledgments
    • Use a single queue/topic that all workers subscribe to as a consumer group, so each video message is delivered to only one worker instance
    • Worker should, Pull a message Process the video Acknowledge successful processing (removing it from the queue) Nack/requeue on failure (so another worker retries)
    • Once a worker receives a message, it becomes invisible to others until it either acks (deletes) or the timeout expires
  • Partitioning or Message Grouping by Video ID
    • Using Kafka Consumer Groups we can Partition the topic by video_id so that all messages for a given video go to the same partition and thus the same consumer instance at a time
    • Using SQS FIFO + Message Group ID we can Assign each video_id as the Message Group ID, ensuring strict ordering and single-consumer handling per group
  • Idempotent Processing and Deduplication
    • Include a unique video_id and event_id in each message. Workers should track processed event_ids in a local or shared store (e.g., Redis or a deduplication table) to ensure that if the same message is redelivered, due to a visibility timeout or retry, it is not reprocessed, preserving exactly-once semantics

Event Sourcing + CQRS Implementation

Event Sourcing - Event Sourcing stores all changes to application state as a sequence of events rather than just the current state. Instead of updating records in-place, you append events that describe what happened
CQRS (Command Query Responsibility Segregation) - CQRS separates read and write operations into different models. Commands change system state, while queries return data without side effects. This pairs excellently with Event Sourcing, where commands generate events and queries use read models built from those events
Event Sourcing + CQRS Implementation
When combining Event Sourcing with CQRS, commands generate events that are stored in the event store. Read models are built by projecting these events

Command → Aggregate → Events → Event Store

Events → Read Model Projections → Query Results

This provides strong consistency for writes and optimized reads with eventual consistency