#Part 4 - Rate Limiting

Elliot Forbes · Mar 20, 2026 · 7 min read

In the previous tutorial, we built a middleware pipeline with logging, CORS, and header injection. Now we’re going to add one of the most important features of any API gateway: rate limiting.

Without rate limiting, a single client can overwhelm your backend services — whether intentionally (a DDoS attack) or accidentally (a buggy client in a retry loop). Rate limiting puts a cap on how many requests a client can make in a given time window.

By the end of this tutorial, we’ll have a token bucket rate limiter that tracks requests per client IP and returns 429 Too Many Requests when a client exceeds their limit.

The Token Bucket Algorithm

There are several rate limiting algorithms (fixed window, sliding window, leaky bucket), but we’ll implement a token bucket because it’s simple, effective, and widely used in production systems including AWS API Gateway and Nginx.

Here’s how it works:

Each client gets a “bucket” that holds tokens (think of them as permission slips)
The bucket starts full — say, 10 tokens
Every request costs one token. If there’s a token available, the request proceeds. If the bucket is empty, the request is rejected with a 429
Tokens refill at a steady rate — say, 2 per second

This naturally allows short bursts of traffic (up to the bucket size) while enforcing a sustained rate limit over time. It’s a nice balance between strict rate limiting and allowing legitimate traffic patterns.

Implementing the Token Bucket

Let’s create a new file for our rate limiter. First, create src/ratelimit.rs:

src/ratelimit.rs

use std::collections::HashMap;
use std::sync::Mutex;
use std::time::Instant;

/// Configuration for the rate limiter
#[derive(Clone)]
pub struct RateLimitConfig {
    /// Maximum tokens (burst capacity)
    pub max_tokens: u32,
    /// Tokens added per second
    pub refill_rate: f64,
}

impl RateLimitConfig {
    pub fn new(max_tokens: u32, refill_rate: f64) -> Self {
        RateLimitConfig {
            max_tokens,
            refill_rate,
        }
    }
}

/// A single client's token bucket
struct TokenBucket {
    tokens: f64,
    last_refill: Instant,
    max_tokens: u32,
    refill_rate: f64,
}

impl TokenBucket {
    fn new(config: &RateLimitConfig) -> Self {
        TokenBucket {
            tokens: config.max_tokens as f64,
            last_refill: Instant::now(),
            max_tokens: config.max_tokens,
            refill_rate: config.refill_rate,
        }
    }

    /// Try to consume a token. Returns true if allowed, false if rate limited.
    fn try_consume(&mut self) -> bool {
        self.refill();

        if self.tokens >= 1.0 {
            self.tokens -= 1.0;
            true
        } else {
            false
        }
    }

    /// Add tokens based on elapsed time
    fn refill(&mut self) {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_refill).as_secs_f64();
        self.tokens = (self.tokens + elapsed * self.refill_rate)
            .min(self.max_tokens as f64);
        self.last_refill = now;
    }

    /// How many tokens are currently available
    fn available_tokens(&self) -> u32 {
        self.tokens as u32
    }
}

A few things to highlight:

f64 for tokens — We use floating point for the token count because refills happen continuously. If the refill rate is 2 tokens/second and 300ms have passed, we should add 0.6 tokens. Integer math would lose this precision and make the rate limiter feel “choppy.”

Instant — This is Rust’s monotonic clock, similar to time.Now() in Go. It’s guaranteed to never go backwards, unlike wall clock time which can be adjusted by NTP or the user.

The Rate Limiter Store

Now we need a store that maps client IPs to their token buckets:

src/ratelimit.rs

/// Stores token buckets for all clients
pub struct RateLimiter {
    buckets: Mutex<HashMap<String, TokenBucket>>,
    config: RateLimitConfig,
}

impl RateLimiter {
    pub fn new(config: RateLimitConfig) -> Self {
        RateLimiter {
            buckets: Mutex::new(HashMap::new()),
            config,
        }
    }

    /// Check if a request from the given key should be allowed
    pub fn check(&self, key: &str) -> RateLimitResult {
        let mut buckets = self.buckets.lock().unwrap();

        let bucket = buckets
            .entry(key.to_string())
            .or_insert_with(|| TokenBucket::new(&self.config));

        if bucket.try_consume() {
            RateLimitResult::Allowed {
                remaining: bucket.available_tokens(),
                limit: self.config.max_tokens,
            }
        } else {
            RateLimitResult::Limited {
                retry_after_secs: (1.0 / self.config.refill_rate).ceil() as u32,
            }
        }
    }
}

pub enum RateLimitResult {
    Allowed { remaining: u32, limit: u32 },
    Limited { retry_after_secs: u32 },
}

Mutex<HashMap<...>> — This is how we handle concurrent access to the bucket store. A Mutex (mutual exclusion lock) ensures only one thread can access the HashMap at a time. When you call .lock(), it waits until the lock is available, then gives you mutable access to the contents.

In Go, you’d use sync.Mutex with explicit Lock()/Unlock() calls. In Rust, the Mutex wraps the data itself, so you can only access the data through the lock — the type system makes it impossible to forget to lock.

.entry().or_insert_with() — This is a convenient pattern for “get the value if it exists, or insert a new one.” It avoids doing a separate lookup and insert, which would require two hash operations.

RateLimitResult enum — Rather than returning a simple boolean, we return an enum with data. The Allowed variant tells us how many requests remain. The Limited variant tells the client when to retry. This makes it easy to include helpful rate limit headers in the response.

Rate Limit Middleware

Now let’s plug our rate limiter into the middleware system we built in Part 3. Create src/middleware/ratelimit.rs:

src/middleware/ratelimit.rs

use super::{Middleware, RequestContext};
use crate::ratelimit::{RateLimitResult, RateLimiter};
use bytes::Bytes;
use http_body_util::{combinators::BoxBody, BodyExt, Full};
use hyper::{Request, Response};

pub struct RateLimitMiddleware {
    limiter: RateLimiter,
}

impl RateLimitMiddleware {
    pub fn new(limiter: RateLimiter) -> Self {
        RateLimitMiddleware { limiter }
    }
}

impl Middleware for RateLimitMiddleware {
    fn on_request(
        &self,
        req: Request<BoxBody<Bytes, hyper::Error>>,
        ctx: &RequestContext,
    ) -> Result<Request<BoxBody<Bytes, hyper::Error>>, Response<BoxBody<Bytes, hyper::Error>>> {
        match self.limiter.check(&ctx.client_ip) {
            RateLimitResult::Allowed { remaining, limit } => {
                // Request is allowed — we'll add rate limit headers in on_response
                Ok(req)
            }
            RateLimitResult::Limited { retry_after_secs } => {
                // Too many requests — short-circuit with 429
                let body = Full::new(Bytes::from(
                    "429 Too Many Requests\nYou have exceeded the rate limit. Please try again later.\n",
                ))
                .map_err(|never| match never {})
                .boxed();

                let response = Response::builder()
                    .status(429)
                    .header("Retry-After", retry_after_secs.to_string())
                    .header("Content-Type", "text/plain")
                    .body(body)
                    .unwrap();

                Err(response)
            }
        }
    }

    fn on_response(
        &self,
        mut resp: Response<BoxBody<Bytes, hyper::Error>>,
        ctx: &RequestContext,
    ) -> Response<BoxBody<Bytes, hyper::Error>> {
        // Add rate limit info headers to successful responses
        if let RateLimitResult::Allowed { remaining, limit } = self.limiter.check(&ctx.client_ip) {
            let headers = resp.headers_mut();
            headers.insert("X-RateLimit-Limit", limit.to_string().parse().unwrap());
            headers.insert(
                "X-RateLimit-Remaining",
                remaining.to_string().parse().unwrap(),
            );
        }
        resp
    }

    fn name(&self) -> &str {
        "rate-limit"
    }
}

Don’t forget to register the new module. Update src/middleware/mod.rs:

src/middleware/mod.rs

pub mod cors;
pub mod headers;
pub mod logging;
pub mod ratelimit;

// ... rest of the file stays the same

And add the ratelimit module in src/main.rs:

src/main.rs

mod ratelimit;

// ... in main():
use ratelimit::{RateLimitConfig, RateLimiter};
use middleware::ratelimit::RateLimitMiddleware;

// Add rate limiting middleware (10 requests burst, 2 per second refill)
let rate_config = RateLimitConfig::new(10, 2.0);
let rate_limiter = RateLimiter::new(rate_config);
pipeline.add(Box::new(RateLimitMiddleware::new(rate_limiter)));

Testing the Rate Limiter

Start the backend and gateway, then blast it with requests:

$ for i in $(seq 1 15); do
    echo "Request $i: $(curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/api/users)"
done

You should see something like:

Request 1: 200
Request 2: 200
...
Request 10: 200
Request 11: 429
Request 12: 429
...

The first 10 requests succeed (the burst capacity), then you get 429 responses. Wait a few seconds and try again — the bucket refills and you can make more requests.

Check the rate limit headers on a successful response:

$ curl -v http://localhost:3000/api/users 2>&1 | grep X-RateLimit
< X-RateLimit-Limit: 10
< X-RateLimit-Remaining: 8

And a rejected response includes a Retry-After header:

$ curl -v http://localhost:3000/api/users 2>&1 | grep Retry-After
< Retry-After: 1

A Note on Production Rate Limiting

Our implementation stores buckets in memory, which means it won’t work across multiple gateway instances. In production, you’d typically store the rate limit state in Redis or a similar shared store. The algorithm stays the same — you’d just swap out the HashMap for a Redis client.

Another consideration is what key to rate limit on. We’re using client IP, which is the most common approach, but you might want to rate limit on API keys, user IDs, or a combination depending on your use case.

Conclusion

We’ve implemented a token bucket rate limiter and integrated it into our middleware pipeline. The key Rust concepts we covered are Mutex for safe concurrent access, enums with data for expressing rich results, and how well the middleware trait system we built in Part 3 accommodates new functionality.

Challenge - Implement a cleanup task that periodically removes stale token buckets from the HashMap. If a client hasn’t been seen in 10 minutes, their bucket should be removed to prevent memory from growing unboundedly. You’ll want tokio::time::interval for this.

Next Part

In Part 5 - Load Balancing & Health Checks, we’ll distribute traffic across multiple instances of the same service and automatically stop routing to unhealthy backends.

Continue Learning

[rustgateway]

Part 6 - Configuration & Authentication

In the final part of this series, we move our gateway configuration into a YAML file and implement JWT-based authentication middleware to protect our backend services.

Mar 2026 9 min read

[rustgateway]

Part 5 - Load Balancing & Health Checks

In this tutorial, we add load balancing to our API gateway with round-robin distribution and background health checks that automatically remove unhealthy backends from the pool.

Mar 2026 8 min read

[rustgateway]

Part 3 - Middleware Pipeline

In this tutorial, we build a composable middleware system using Rust traits. We implement logging, CORS, and custom header injection middleware for our API gateway.

Mar 2026 10 min read

[rustgateway]

Part 2 - Routing & Path Matching

In the second part of this series, we add a routing layer to our API gateway so it can forward requests to different backend services based on URL path patterns.

Mar 2026 9 min read

#Part 4 - Rate Limiting

##The Token Bucket Algorithm

##Implementing the Token Bucket

##The Rate Limiter Store

##Rate Limit Middleware

##Testing the Rate Limiter

##A Note on Production Rate Limiting

##Conclusion

##Next Part

Continue Learning

Part 6 - Configuration & Authentication

Part 5 - Load Balancing & Health Checks

Part 3 - Middleware Pipeline

Part 2 - Routing & Path Matching

The Token Bucket Algorithm

Implementing the Token Bucket

The Rate Limiter Store

Rate Limit Middleware

Testing the Rate Limiter

A Note on Production Rate Limiting

Conclusion

Next Part