Part 5 - Load Balancing & Health Checks Image

#Part 5 - Load Balancing & Health Checks

Elliot Forbes Elliot Forbes · Mar 20, 2026 · 8 min read

In the previous tutorial, we added rate limiting to protect our backends from abuse. Now we’re going to tackle another critical piece of infrastructure: load balancing.

Up to this point, each route in our gateway maps to a single backend server. But in production, you typically run multiple instances of each service for reliability and throughput. If one instance crashes, the others keep serving traffic. Load balancing distributes requests across these instances.

By the end of this tutorial, we’ll have round-robin load balancing across multiple backend instances, plus background health checks that automatically stop sending traffic to unhealthy backends.

Updating the Route Model

First, we need to update our Route to support multiple backend instances instead of just one. Let’s rework src/router.rs:

src/router.rs
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, RwLock};

#[derive(Debug, Clone)]
pub struct Backend {
    pub host: String,
    pub port: u16,
    pub url: String,
    pub healthy: bool,
}

impl Backend {
    pub fn new(url: &str) -> Self {
        let trimmed = url.trim_start_matches("http://");
        let parts: Vec<&str> = trimmed.split(':').collect();
        let host = parts[0].to_string();
        let port: u16 = parts.get(1).unwrap_or(&"80").parse().unwrap_or(80);
        Backend {
            host,
            port,
            url: url.to_string(),
            healthy: true,
        }
    }

    pub fn address(&self) -> String {
        format!("{}:{}", self.host, self.port)
    }
}

pub struct Route {
    pub path_prefix: String,
    backends: RwLock<Vec<Backend>>,
    current: AtomicUsize,
}

impl Route {
    pub fn new(path_prefix: &str, backend_urls: Vec<&str>) -> Self {
        let backends: Vec<Backend> = backend_urls.iter().map(|url| Backend::new(url)).collect();
        Route {
            path_prefix: path_prefix.to_string(),
            backends: RwLock::new(backends),
            current: AtomicUsize::new(0),
        }
    }

    pub fn matches(&self, path: &str) -> bool {
        path.starts_with(&self.path_prefix)
    }

    /// Get the next healthy backend using round-robin
    pub fn next_backend(&self) -> Option<Backend> {
        let backends = self.backends.read().unwrap();
        let healthy: Vec<&Backend> = backends.iter().filter(|b| b.healthy).collect();

        if healthy.is_empty() {
            return None;
        }

        let idx = self.current.fetch_add(1, Ordering::Relaxed) % healthy.len();
        Some(healthy[idx].clone())
    }

    /// Get all backends (for health checking)
    pub fn get_backends(&self) -> Vec<Backend> {
        self.backends.read().unwrap().clone()
    }

    /// Update the health status of a backend
    pub fn set_backend_health(&self, url: &str, healthy: bool) {
        let mut backends = self.backends.write().unwrap();
        if let Some(backend) = backends.iter_mut().find(|b| b.url == url) {
            if backend.healthy != healthy {
                let status = if healthy { "healthy" } else { "unhealthy" };
                println!("  [health] {} is now {}", url, status);
            }
            backend.healthy = healthy;
        }
    }
}

There are some important new concepts here:

RwLock — A reader-writer lock. Unlike a Mutex which only allows one accessor at a time, an RwLock allows many concurrent readers OR one writer. This is perfect for our use case: requests read the backend list frequently, but health checks write to it infrequently. RwLock gives us better performance under high traffic than a Mutex would.

AtomicUsize — An atomic integer for our round-robin counter. Atomics are integers that can be safely modified from multiple threads without a lock. fetch_add(1, Ordering::Relaxed) atomically increments the counter and returns the previous value. Ordering::Relaxed means we don’t need strict memory ordering guarantees — we just need the counter to increment. This is much cheaper than taking a lock for every request.

Round-robin — The simplest load balancing algorithm. We cycle through healthy backends one by one: backend 0, backend 1, backend 2, backend 0, backend 1, … The modulo operator (%) wraps around. It’s fair and predictable.

Updating the Router

We also need to update the Router to work with the new route format:

src/router.rs
pub struct Router {
    routes: Vec<Arc<Route>>,
}

impl Router {
    pub fn new() -> Self {
        Router { routes: Vec::new() }
    }

    pub fn add_route(&mut self, path_prefix: &str, backend_urls: Vec<&str>) {
        let route = Arc::new(Route::new(path_prefix, backend_urls));
        let backends: Vec<String> = backend_urls.iter().map(|u| u.to_string()).collect();
        println!(
            "  Route added: {} -> [{}]",
            path_prefix,
            backends.join(", ")
        );
        self.routes.push(route);
    }

    pub fn find_route(&self, path: &str) -> Option<Arc<Route>> {
        self.routes
            .iter()
            .filter(|route| route.matches(path))
            .max_by_key(|route| route.path_prefix.len())
            .cloned()
    }

    pub fn all_routes(&self) -> &[Arc<Route>] {
        &self.routes
    }
}

Notice we’re now wrapping routes in Arc<Route> so they can be shared between the request handling path and the health check task.

Background Health Checks

Now let’s implement the health check system. This will run in the background and periodically ping each backend to check if it’s alive:

src/health.rs
use crate::router::Route;
use std::sync::Arc;
use std::time::Duration;
use tokio::net::TcpStream;
use tokio::time;

/// Start background health checks for all routes
pub fn start_health_checks(routes: Vec<Arc<Route>>, interval_secs: u64) {
    tokio::task::spawn(async move {
        let mut interval = time::interval(Duration::from_secs(interval_secs));
        println!(
            "  Health checks started (every {}s)",
            interval_secs
        );

        loop {
            interval.tick().await;
            check_all_backends(&routes).await;
        }
    });
}

async fn check_all_backends(routes: &[Arc<Route>]) {
    for route in routes {
        let backends = route.get_backends();
        for backend in &backends {
            let healthy = check_backend(&backend.host, backend.port).await;
            route.set_backend_health(&backend.url, healthy);
        }
    }
}

async fn check_backend(host: &str, port: u16) -> bool {
    let addr = format!("{}:{}", host, port);
    match time::timeout(
        Duration::from_secs(3),
        TcpStream::connect(&addr),
    )
    .await
    {
        Ok(Ok(_)) => true,   // Connected successfully
        Ok(Err(_)) => false,  // Connection refused
        Err(_) => false,      // Timeout
    }
}

tokio::time::interval — Creates a timer that ticks at a regular interval, similar to time.NewTicker in Go. We use this to run health checks every N seconds.

tokio::time::timeout — Wraps a future with a timeout. If the TcpStream::connect doesn’t complete within 3 seconds, we consider the backend unhealthy. The nested Result is why we have Ok(Ok(_)) — the outer Result is from the timeout (did it finish in time?) and the inner is from the connection (did it succeed?).

Our health check is a simple TCP connection check. For a production system, you’d typically hit a dedicated /health endpoint and check for a 200 response, but the TCP check is good enough for our purposes and keeps the code focused.

Updating main.rs

Now let’s wire everything together:

src/main.rs
mod health;
mod middleware;
mod proxy;
mod ratelimit;
mod router;

// ... (imports) ...

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure routes with multiple backends
    let mut router = Router::new();
    println!("Configuring routes...");
    router.add_route("/api/users", vec![
        "http://127.0.0.1:8081",
        "http://127.0.0.1:8084",
    ]);
    router.add_route("/api/orders", vec![
        "http://127.0.0.1:8082",
        "http://127.0.0.1:8085",
    ]);
    router.add_route("/", vec!["http://127.0.0.1:8080"]);

    // Start background health checks
    println!("Starting health checks...");
    let routes_for_health = router.all_routes().to_vec();
    health::start_health_checks(routes_for_health, 10);

    // ... rest of setup ...
}

And update the request handler to use next_backend():

src/main.rs
async fn handle_request(
    req: Request<hyper::body::Incoming>,
    gateway: &Gateway,
    client_ip: &str,
) -> Result<Response<BoxBody<Bytes, hyper::Error>>, hyper::Error> {
    let ctx = RequestContext {
        client_ip: client_ip.to_string(),
        start_time: std::time::Instant::now(),
    };

    let path = req.uri().path().to_string();

    let (parts, body) = req.into_parts();
    let boxed_req = Request::from_parts(parts, body.boxed());

    let boxed_req = match gateway.pipeline.process_request(boxed_req, &ctx) {
        Ok(req) => req,
        Err(response) => return Ok(gateway.pipeline.process_response(response, &ctx)),
    };

    let response = match gateway.router.find_route(&path) {
        Some(route) => {
            match route.next_backend() {
                Some(backend) => {
                    proxy::forward_to_backend(boxed_req, &backend, &route.path_prefix).await?
                }
                None => {
                    // All backends are down
                    let body = Full::new(Bytes::from("503 Service Unavailable\nAll backends are unhealthy.\n"))
                        .map_err(|never| match never {})
                        .boxed();
                    Response::builder().status(503).body(body).unwrap()
                }
            }
        }
        None => {
            let body = Full::new(Bytes::from("404 Not Found"))
                .map_err(|never| match never {})
                .boxed();
            Response::builder().status(404).body(body).unwrap()
        }
    };

    Ok(gateway.pipeline.process_response(response, &ctx))
}

Now when all backends for a route are down, we return a 503 Service Unavailable instead of trying to connect and getting a 502. This is a better experience for API consumers — a 503 clearly communicates that the service exists but is temporarily unavailable.

Testing Load Balancing

Let’s test with multiple backends for the user service. Open five terminals:

# Terminal 1 & 2 - Two instances of the user service
$ cargo run --example backend -- 8081 user-service-1
$ cargo run --example backend -- 8084 user-service-2

# Terminal 3 - Order service (single instance)
$ cargo run --example backend -- 8082 order-service

# Terminal 4 - Default backend
$ cargo run --example backend -- 8080 default-service

# Terminal 5 - The gateway
$ cargo run

Now make several requests to the user service:

$ for i in $(seq 1 6); do
    curl -s http://localhost:3000/api/users/test
done

You should see the traffic alternating between the two instances:

Response from: user-service-1
Response from: user-service-2
Response from: user-service-1
Response from: user-service-2
Response from: user-service-1
Response from: user-service-2

Testing Health Checks

Now stop user-service-1 (Ctrl+C in its terminal). Wait for a health check cycle (up to 10 seconds), and you’ll see in the gateway output:

  [health] http://127.0.0.1:8081 is now unhealthy

Now all traffic goes to user-service-2:

$ for i in $(seq 1 4); do
    curl -s http://localhost:3000/api/users/test
done
Response from: user-service-2
Response from: user-service-2
Response from: user-service-2
Response from: user-service-2

Start user-service-1 again and wait for the next health check:

  [health] http://127.0.0.1:8081 is now healthy

Traffic is distributed again. This is the same pattern that production load balancers like HAProxy and Nginx use — automatic failover and recovery.

Conclusion

Our API gateway now distributes traffic across multiple backends using round-robin load balancing, and automatically detects and routes around unhealthy backends. We covered RwLock for read-heavy concurrent access, atomic operations for lock-free counters, background tasks with tokio::task::spawn, and timeout-wrapped health checks.

Challenge - Implement a different load balancing strategy: least connections. Instead of round-robin, track how many active requests each backend is handling and send the next request to the backend with the fewest active connections. You’ll need an AtomicUsize counter on each backend.

Next Part

In Part 6 - Configuration & Authentication, we’ll move our gateway configuration into a YAML file and add JWT-based authentication middleware.