How Sagas Keep Distributed Systems Consistent

In traditional, monolithic systems, business operations are typically wrapped in database transactions. A transaction guarantees atomicity: either all operations succeed, or none of them do. This model works perfectly when everything happens inside a single database engine that can coordinate inserts, updates, and deletes in one place.

However, as systems grow and become distributed—across multiple services, databases, clouds, and APIs—traditional transactions begin to break down.
These are the so-called large (or long-running) transactions, and they introduce three major problems:

1. Performance

Coordinating locks across multiple resources drastically slows down both reads and writes. Scaling becomes difficult.

2. Availability

In distributed environments, a single slow component can hold the entire transaction hostage, leading to timeouts or blocking other operations.

3. Feasibility

Modern microservice architectures often rely on message queues, event streams, and separate databases. Coordinating a distributed 2-phase commit is either too expensive—or simply impossible.

Because of this, the idea of “one transaction to rule them all” no longer fits.


Sagas—A Practical Alternative to Distributed Transactions

When traditional long-running transactions fail, the Saga pattern steps in.

A saga decomposes a business process into a sequence of smaller, local operations.
Each operation performs a single step and commits immediately—no locks, no distributed transaction.

If one of the steps later fails, the saga triggers compensating actions to undo or neutralize the effects of previous steps.

Example:

  1. Reserve an item → compensate by releasing the reservation
  2. Charge a credit card → compensate by issuing a refund
  3. Create a shipment → compensate by canceling the shipment

Sagas allow systems to maintain consistency while embracing the realities of distributed environments.

There are two main saga styles:

  • Orchestration — a central coordinator directs the steps (common in .NET frameworks)
  • Choreography — each service reacts to events without a central controller

Both approaches remove the need for heavyweight distributed transactions while keeping processes reliable and flexible.

Frameworks and Platforms That Support Sagas

You don’t need to implement sagas from scratch. Many ecosystems provide built-in saga patterns or “process managers” to help developers handle long-running business workflows:

MassTransit (.NET)

An open-source messaging framework with first-class saga support, state machines, persistence, retries, and fault handling. Very popular in the .NET world.

NServiceBus (.NET)

A mature service bus framework with a strong saga implementation. Commercial, but widely used in enterprise systems.

Axon Framework (Java)

A CQRS and event-sourcing framework with built-in process managers (sagas). Strongly focused on domain-driven design.

Camunda / BPMN Engines (Java/Go/Node.js)

Workflow engines that model sagas as compensating transactions or subprocesses. Ideal for visual process modeling.

Temporal.io (Go/Java/Node/Python/.NET/PHP)

A modern workflow engine used by companies like Uber, Airbnb, and Stripe. Provides extremely reliable orchestration with saga-style compensation built in.

Azure Durable Functions (Microsoft)

A cloud-native orchestration engine for serverless applications. An orchestrator function acts as the saga coordinator, managing retries and compensation.

AWS Step Functions

A visual orchestrator for distributed workflows. State machines define business processes with native support for rollback steps.

Each of these tools offers a different style—from code-driven orchestration to visual workflows—but the underlying principle remains the same:
break large transactions into smaller, reliable steps and compensate when things go wrong.


Example: Order Processing Saga in MassTransit (.NET)

This saga coordinates the lifecycle of an order: submission, payment, and shipment.

1. Define the Saga State

using MassTransit;
using System;

public class OrderState : SagaStateMachineInstance
{
    public Guid CorrelationId { get; set; }
    public string CurrentState { get; set; }

    public string OrderId { get; set; }
    public bool PaymentReceived { get; set; }
    public bool OrderShipped { get; set; }
}

2. Define Events

// Normal events
public record OrderSubmitted(string OrderId);
public record PaymentReceived(string OrderId);
public record OrderShipped(string OrderId);

// Rollback / compensation / cancellation events
public record PaymentFailed(string OrderId);
public record OrderCancelled(string OrderId);

3. Create the Saga State Machine

using MassTransit;
using System;

public class OrderStateMachine : MassTransitStateMachine<OrderState>
{
    public State Submitted { get; private set; }
    public State Paid { get; private set; }
    public State Completed { get; private set; }

    public Event<OrderSubmitted> OrderSubmittedEvent { get; private set; }
    public Event<PaymentReceived> PaymentReceivedEvent { get; private set; }
    public Event<OrderShipped> OrderShippedEvent { get; private set; }

    public Event<PaymentFailed> PaymentFailedEvent { get; private set; }
    public Event<OrderCancelled> OrderCancelledEvent { get; private set; }

    public OrderStateMachine()
    {
        InstanceState(x => x.CurrentState);

        // Correlation by OrderId
        Event(() => OrderSubmittedEvent, x => x.CorrelateById(context => Guid.Parse(context.Message.OrderId)));
        Event(() => PaymentReceivedEvent, x => x.CorrelateById(context => Guid.Parse(context.Message.OrderId)));
        Event(() => OrderShippedEvent, x => x.CorrelateById(context => Guid.Parse(context.Message.OrderId)));
        Event(() => PaymentFailedEvent, x => x.CorrelateById(context => Guid.Parse(context.Message.OrderId)));
        Event(() => OrderCancelledEvent, x => x.CorrelateById(context => Guid.Parse(context.Message.OrderId)));

        // Initial state: Order Submitted
        Initially(
            When(OrderSubmittedEvent)
                .Then(context =>
                {
                    context.Saga.OrderId = context.Message.OrderId;
                    Console.WriteLine($"Order {context.Saga.OrderId} submitted.");
                })
                .TransitionTo(Submitted)
        );

        // From Submitted
        During(Submitted,
            When(PaymentReceivedEvent)
                .Then(context =>
                {
                    context.Saga.PaymentReceived = true;
                    Console.WriteLine($"Payment received for order {context.Saga.OrderId}.");
                })
                .TransitionTo(Paid),

            When(OrderCancelledEvent)
                .Then(context => Console.WriteLine($"Order {context.Saga.OrderId} cancelled before payment."))
                .Finalize()
        );

        // From Paid
        During(Paid,
            When(OrderShippedEvent)
                .Then(context =>
                {
                    context.Saga.OrderShipped = true;
                    Console.WriteLine($"Order {context.Saga.OrderId} shipped.");
                })
                .TransitionTo(Completed)
                .Finalize(),

            When(PaymentFailedEvent)
                .ThenAsync(async context =>
                {
                    // Compensating action: refund or notify
                    context.Saga.PaymentReceived = false;
                    Console.WriteLine($"Payment failed for order {context.Saga.OrderId}. Rolling back to Submitted state.");
                    await Task.CompletedTask;
                })
                .TransitionTo(Submitted),

            When(OrderCancelledEvent)
                .Then(context => Console.WriteLine($"Order {context.Saga.OrderId} cancelled after payment. Refund initiated."))
                .Finalize()
        );

        SetCompletedWhenFinalized();
    }
}

4. Configure MassTransit with Saga

using MassTransit;

var busControl = Bus.Factory.CreateUsingRabbitMq(cfg =>
{
    cfg.Host("localhost", "/", h => { });

    cfg.ReceiveEndpoint("order_saga_queue", e =>
    {
        e.StateMachineSaga(new OrderStateMachine(), new InMemorySagaRepository<OrderState>());
    });
});

await busControl.StartAsync();

Console.WriteLine("Bus started. Press any key to exit...");
Console.ReadKey();

await busControl.StopAsync();

Key Points

  1. Saga tracks state across multiple messages (Submitted → Paid → Completed).
  2. Rollbacks and compensations are handled by explicit events (PaymentFailed, OrderCancelled).
  3. TransitionTo(previousState) allows state rollback, while Finalize() ends the saga.
  4. Compensating actions (like refunds) are done inside .ThenAsync(...).
  5. Correlation ensures messages are routed to the correct saga instance.

Conclusion

As systems become more distributed, scalable, and event-driven, the traditional concept of a single transaction that spans everything becomes unrealistic. Sagas embrace this reality by offering a robust model for maintaining business consistency without sacrificing performance or availability.

Whether you prefer explicit orchestration, event-driven choreography, or a workflow engine, modern tooling provides everything you need to build reliable long-running processes.

Sagas are no longer an optional architectural pattern—they’re essential for any serious distributed system.

Leave a Reply

Your email address will not be published. Required fields are marked *