Load Balancing

A load balancer sits in front of a pool of servers and decides where each incoming request should go. It is the keystone of horizontal scaling and a key piece of fault tolerance.

Why You Need One

Without a load balancer, clients must know about every server. New deployments break clients, dead servers strand traffic, and unequal load wastes capacity. With one, clients hit a single endpoint and the balancer hides the topology.

Algorithms

Round-robin sends each new request to the next server in a list. It is simple and works well when servers and requests are similar. The example shows 10 requests landing on 3 servers as a 4-3-3 split.

Least-connections sends to whichever server has the fewest active connections. This handles long-lived connections like WebSockets better than round-robin.

Weighted assigns a weight to each server. A server with weight 3 receives three times as many requests as one with weight 1. Useful when servers have different capacity.

Hash-based maps a key like the client IP or user ID to a fixed server. This gives stickiness without storing session data on the balancer. Consistent hashing keeps most assignments stable when servers are added or removed.

Health Checks

A load balancer must avoid sending traffic to dead servers. It runs a health check by hitting each server periodically. If a check fails, the server is taken out of rotation. When checks pass again, traffic resumes. Good health checks test the actual code path, not just whether the process is running.

Layer 4 vs Layer 7

A layer 4 balancer routes by IP and port. It is fast and protocol-agnostic. A layer 7 balancer understands HTTP, so it can route based on URL, headers, or cookies, and it can terminate TLS. Most modern setups use layer 7.

Try It Yourself

Extend the example to a weighted round-robin where one server gets twice as much traffic.
Add a health check that marks a server as down after N failures.
Implement least-connections with a counter per server.