Circuit Breaker Pattern

The Circuit Breaker pattern is a powerful tool in distributed systems for handling failures gracefully and preventing cascading failures. It’s especially important when dealing with external services or APIs that might be unreliable or temporarily unavailable. Instead of repeatedly trying to access a failing service and potentially overwhelming your system, the circuit breaker monitors the service’s health and intervenes to protect your application.

How the Circuit Breaker Works

The core idea behind the circuit breaker is simple: it acts like an electrical circuit breaker. When a fault is detected, the circuit “trips,” preventing further attempts to access the failing service. After a period of time, the circuit breaker attempts to “reconnect” to the service. If the service is operational, the circuit closes, allowing requests through. If it continues to fail, the circuit breaker remains open.

Let’s visualize this with a Diagram:

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure Threshold Reached
    Open --> HalfOpen : Timeout Elapsed
    HalfOpen --> Closed : Success
    HalfOpen --> Open : Failure
    Open --> [*] : Circuit Breaker Disabled (Manual Reset)
    Closed --> [*] : Circuit Breaker Disabled (Manual Reset)
    
    state Closed {
      [*] --> NormalOperation
      NormalOperation : Normal Operation
      NormalOperation --> Success : Service Available
      NormalOperation --> Failure : Service Unavailable
    }

    state Open {
        [*] --> ServiceUnavailable
        ServiceUnavailable : Circuit Open
        ServiceUnavailable --> Error : Service Unavailable
    }

    state HalfOpen {
        [*] --> TestingService
        TestingService : Testing Service Availability
        TestingService --> Success : Service Available
        TestingService --> Failure : Service Unavailable
    }

This diagram shows how a system handles service failures:

States:

Key Transitions:

  1. Closed → Open: Too many failures trigger circuit “trip”
  2. Open → HalfOpen: Timeout allows recovery attempt
  3. HalfOpen → Closed: Service restored after successful test
  4. HalfOpen → Open: Service still failing, returns to blocked state

Each state contains internal logic for request handling and specific failure/success behaviors. Manual reset option exists to disable circuit breaker if needed.

Implementing the Circuit Breaker

The implementation can vary depending on the programming language and framework. Many libraries and frameworks offer ready-made implementations, but understanding the core logic is important. Let’s look at a simplified example in Python:

import time

class CircuitBreaker:
    def __init__(self, failure_threshold=3, recovery_timeout=10):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = 0
        self.state = "CLOSED"

    def call(self, func, *args, **kwargs):
        if self.state == "OPEN":
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = "HALF-OPEN"
                self.failure_count = 0

            if self.state == "HALF-OPEN":
                try:
                    result = func(*args, **kwargs)
                    self.state = "CLOSED"
                    self.failure_count = 0
                    return result
                except Exception as e:
                    self.state = "OPEN"
                    self.failure_count = self.failure_threshold + 1 #Force open
                    self.last_failure_time = time.time()
                    raise
            else:
                raise Exception("Service Unavailable")
        else:
            try:
                result = func(*args, **kwargs)
                self.failure_count = 0
                return result
            except Exception as e:
                self.failure_count += 1
                self.last_failure_time = time.time()
                if self.failure_count >= self.failure_threshold:
                    self.state = "OPEN"
                    raise
                else:
                    raise


def external_service():
    # Simulate a flaky service
    if time.time() % 2 < 1:
        raise Exception("Service Unavailable")
    return "Success"

breaker = CircuitBreaker()

try:
    result = breaker.call(external_service)
    print(f"Result: {result}")
except Exception as e:
    print(f"Error: {e}")

This example showcases a basic implementation with a CLOSED, OPEN, and HALF-OPEN state. Implementations would typically include metrics tracking, configurable parameters, and more complex failure handling.

Advanced Considerations