I’m starting on some new projects soon, projects I would very much like to be stateless and highly available. Partially because I like to think of myself as a grown-up developer now who wants to code things the way they ought to be; mostly because I’m a lazy sysadmin who would rather not have to scramble just because the cloud decided to rain that day.
From a dev standpoint, this means looking at Docker containers, which are cool. From an admin standpoint, this means looking at load balancers, which are not.
A Load of Balancers
There are plenty of cloud-based load balancing options available if you’re willing to pay, like Amazon’s imaginatively-named Application Load Balancer or DigitalOcean’s minimalistically-titled Load Balancer. But I’m not willing to pay for things which fail to impress. All of these cloud “load balancers” boil down to the same basic structure:
- A pair of cloud servers with DNS-based failover.
- Health checks.
- Some amount of application-level routing.
That sounds exactly like the sort of thing I’d want a crack at cobbling together myself first. Maybe not to the same depth of features as something like Amazon’s offering, but at least something serviceable.
Is “Passive” Good Enough?
There are a number of load balancing solutions out there, HAProxy probably being the most mature one. At first I dismissed Nginx — a technology I’m already using for other purposes — due to its health check limitations. Namely, open-source Nginx only supports passive health checks.
The idea of sending traffic to potentially bad backend servers did not appeal to me. Intuitively, this sounded like dropped traffic and missed requests. It’s passive, after all, and passive has a negative connotation in the world of infrastructure. You don’t want to be passive, you want to be proactive! And besides, Nginx Plus offers an “active” solution, and it costs money, so it must be better.
But I’d never tried it, and something sounding bad is not a good enough reason to avoid it. So I set up a quick test lab and gave it a spin.
Turns out Nginx never missed a beat.
In retrospect, this makes perfect sense. The worst case scenario for a passively or actively checking load balancer is exactly the same. No matter how intelligent your load balancer is, it isn’t clairvoyant. There is always a chance of live traffic being passed onto an node that’s about to become unresponsive, and any traffic routing solution worth its salt knows that and has some kind of transparent retry logic baked in.
Really, the only way the above scenario can be avoided at all times is by broadcasting duplicate traffic to multiple backend servers simultaneously and returning the reply from the winner (à la dnsmasq --all-servers
). That's a lot of overhead that would be better used passing new requests!
Now, this isn’t to say that passive health evaluation is good enough for all use cases. There are real network penalties that get paid, penalties that are minimized in this example since it’s all happening over a local network. And while the worst-case scenario for passive and active health checks is identical, active health checks make that scenario (passing live, non-synthetic traffic to a dead node) less likely. You can also get a lot fancier with active health checks by sending known-good requests and expecting positive HTTP response codes. I don’t particularly see that as the load balancer’s job, but that’s a topic for another time.
This demo proves that open-source Nginx is more than capable enough for my needs. And, more importantly, it serves as reminder to always lab out your assumptions. Chances are you’ll learn something.