The other day, I was discussing system architecture with some colleagues. The topic of rate limiting came up, and a rather contentious point was raised: Is rate limiting an anti-pattern for internal services? 

One of my colleagues argued that if internal services require rate limiting, it’s a sign of underlying architectural issues. Well-designed services, he contended, should scale automatically without needing such restrictions. 

My initial reaction was strong disagreement. I’ve implemented rate limiting and seen firsthand how beneficial it can be in preventing cascading failures and protecting against unexpected behavior. 

After reflecting a bit on this topic I realized there’s nuance to this debate. It’s not a simple black-and-white situation. Let’s explore both sides of the argument and offer a more balanced perspective on when and how to apply rate limiting to internal services.

What is Rate Limiting?

Rate limiting is a technique used to control the rate of requests sent to a service or API. It prevents a single client or service from overwhelming a target service with too many requests in a given time window. This is important for maintaining system stability, preventing abuse, and ensuring fair resource allocation.

Here’s a simple example of rate limiting in Go using a token bucket algorithm:

package main

import (
        "fmt"
        "sync"
        "time"
)

type TokenBucket struct {
        capacity    int
        tokens      int
        refillRate  int // Tokens per second
        lastRefill  time.Time
        mutex       sync.Mutex
}

func NewTokenBucket(capacity, refillRate int) *TokenBucket {
        return &TokenBucket{
                capacity:    capacity,
                tokens:      capacity,
                refillRate:  refillRate,
                lastRefill:  time.Now(),
        }
}

func (tb *TokenBucket) Allow() bool {
        tb.mutex.Lock()
        defer tb.mutex.Unlock()

        now := time.Now()
        elapsed := now.Sub(tb.lastRefill).Seconds()
        tb.tokens = min(tb.capacity, tb.tokens+int(elapsed*float64(tb.refillRate)))
        tb.lastRefill = now

        if tb.tokens > 0 {
                tb.tokens--
                return true
        }
        return false
}

func min(a, b int) int {
        if a < b {
                return a
        }
        return b
}

func main() {
    bucket := NewTokenBucket(5, 1) // Capacity of 5, refill rate of 1 token/second
    for i := 0; i < 10; i++ {
        if bucket.Allow() {
            fmt.Println("Request allowed")
        } else {
            fmt.Println("Request limited")
        }
        time.Sleep(500 * time.Millisecond)
    }
}

Here’s an example of rate limiting configuration in Nginx, where limiting requests per IP address to 5 per second, allowing a burst of up to 10 requests

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s; # 5 requests per second per IP, stored for 10 minutes

    server {
        location /api/ {
            limit_req zone=mylimit burst=10 nodelay; # Allow a burst of 10 requests
            # ... your other configurations
        }
    }
}

The Case Against Rate Limiting for Internal Services

The practice of rate limiting, while often employed to control the flow of incoming requests and protect services from overload, is not consensual, particularly when applied to internal services within a system. The central argument against this practice is that it acts as a band-aid solution, masking underlying architectural issues that should be addressed directly. 

Let’s explore some of the key problems that rate limiting might be concealing:

  • Inefficient Algorithms and Data Structures: If a service requires rate limiting to function, it could indicate that the algorithms it employs are computationally expensive or that the data structures it uses are not optimized for efficient access. This can lead to unnecessary delays and performance bottlenecks.
  • Lack of Caching: Caching frequently accessed data can significantly improve the performance of a service. If a service is constantly bombarded with requests for the same data, it suggests that proper caching mechanisms are not in place, leading to unnecessary load on backend systems.
  • Single Points of Failure: A service that relies on rate limiting to stay afloat might be susceptible to bottlenecks due to the presence of single points of failure. If a critical component of the service fails, the entire system can be brought down, impacting availability and performance.
  • Lack of Asynchronous Processing: Synchronous calls can block threads and limit the throughput of a service. If a service is overwhelmed by synchronous requests, it suggests that asynchronous processing patterns are not being utilized effectively, leading to performance degradation.

The case against rate-limiting internal services advocates for a more holistic approach to system design. Investing in proper architecture, scaling, and performance optimization is considered a more sustainable and effective long-term strategy than relying on rate limiting as a crutch. By addressing the root causes of performance bottlenecks, systems can be designed to handle increased load and deliver optimal performance without resorting to artificial restrictions.

How Can Rate Limiting Be Beneficial for Internal Services?

While the practice of rate-limiting internal services might seem counterintuitive at first, it offers a range of advantages that can enhance the overall stability, security, and efficiency of a system:

  • Cascading Failure Prevention: In a complex system of internal services, a sudden surge of requests from one service can easily overwhelm another, potentially triggering a domino effect of failures. Rate limiting acts as a protective barrier, absorbing the shock and preventing the propagation of issues throughout the system.
  • Mitigation of Bugs and Unexpected Behavior: Software bugs are an inevitable reality, and when they occur, they can lead to unforeseen spikes in requests that can cripple services. Rate limiting serves as a safety net, minimizing the impact of such anomalies and providing a buffer for recovery.
  • Resource Prioritization: Under heavy load, ensuring that critical services receive the resources they need to function optimally is paramount. Rate limiting enables resource allocation prioritization, guaranteeing that essential services maintain adequate performance levels even during periods of high demand.
  • Cost Control: Many services interact with external APIs that operate on a usage-based pricing model. Unchecked requests to these APIs can result in unexpected and potentially exorbitant costs. Rate limiting provides a mechanism for controlling and predicting these costs, preventing unwelcome financial surprises.
  • Internal Denial of Service (DoS) Protection: While less frequent than external attacks, internal denial-of-service attacks can occur, whether intentionally or accidentally. Rate limiting adds a layer of defense against such attacks, safeguarding the system from malicious or compromised internal actors.
  • Testing and Deployment: During the critical phases of testing and deployment, especially when employing strategies like phased rollouts or canary deployments, rate limiting is invaluable. It allows for controlled and gradual increases in traffic, enabling close monitoring of performance and the identification of potential issues before they impact the entire user base.

The implementation of rate limiting for internal services, while sometimes debated, provides a multitude of benefits that contribute to a more robust, secure, and cost-effective system. By carefully considering the specific needs and context of each service, rate limiting can be strategically employed to optimize performance and mitigate risks.

How to Apply Rate Limiting to Internal Services

Ideally, we should follow a balanced approach. Within reason, we should focus on building scalable and resilient services from the start, addressing performance bottlenecks when needed. However, we shouldn’t completely dismiss rate-limiting for internal services and we should use it strategically.

How can we then decide when to apply rate limiting?

  • Prioritize proper architecture and scaling: Focus on building scalable and resilient services from the outset. Address performance bottlenecks, implement proper caching, and ensure high availability.
  • Consider the criticality of the services: Rate limiting is more important for critical services that are essential for the overall system’s stability.
  • Analyze the risk of cascading failures: If services are tightly coupled and a failure in one could easily bring down others, rate limiting is a good idea.
  • Evaluate the cost of failure: If a service outage would have a significant business impact, rate limiting is a worthwhile investment.
  • Use rate limiting as a safety net, not a primary scaling mechanism: Don’t rely on rate limiting to handle normal traffic spikes. It should be a last line of defense against unexpected events.
  • Implement rate limiting strategically: Consider different types of rate limiting (e.g., token bucket, leaky bucket) and apply them at the appropriate level (e.g., service level, API gateway). Actively monitor and review closely the events that trigger rate limiting and improve the system accordingly. These events need to be exceptions and not normal system behavior.

Conclusion

The strategic use of rate limiting is important for building robust and resilient internal systems. By controlling the flow of requests, rate limiting prevents overload, protects against DDoS attacks, ensures fair resource allocation, and improves overall system stability. It also allows for graceful degradation during peak loads, preventing cascading failures and maintaining service availability

Effective implementation of rate limiting requires careful consideration of various factors, including the choice of algorithm, setting appropriate limits, and regular monitoring and adjustment. It’s important to strike a balance between protecting the system and providing a positive user experience.

When combined with sound architectural principles, rate limiting becomes a powerful tool for building internal systems that can withstand unexpected events and maintain consistent performance. By proactively managing resource consumption and mitigating potential threats, organizations can ensure the reliability and availability of their critical internal infrastructure.

Article reviewed by Rafael Nunes and Xavier Araújo.

Updated with feedback from João Neto.