The Secrets of Load-balancing Long-Lived TCP Connections

November 18, 2020
Engineering

How do you deal with load-balancing customer traffic at the border of your infrastructure when you don’t own the network? Following a series of experiments, I implemented a service that leverages our internal Graphite monitoring to dynamically weight HAProxy backend servers based on some load measurement.

Hosted Graphite is a time-series metrics monitoring tool built to monitor applications, infrastructure, networks and systems. Try it out yourself with a 14-day free trial with HostedGraphite here! Start monitoring your systems, and get your data visualized in our awesome Hosted Grafana dashboards.

Scaling problems

In the early days, we relied on Route53 round-robin DNS for load-balancing data sent to our TCP and UDP endpoints, but the limitations of this approach became more obvious as we scaled. Although DNS is a simple solution and relatively cost-effective for load-balancing, it has several drawbacks such as clients disobeying DNS record TTLs and having no control over client-side DNS caching. To add to that, for services like ours with non-uniform per connection data rates, there was no actual balancing being applied. As a result, as we added more and more servers, our DNS records continued to grow and inevitably became difficult to manage. We started to have regular issues with customers overloading servers with TCP traffic sent over a small number of connections. In such cases, rotating servers out of service to mitigate the impact was a manual process for our SREs and would not solve the issue quickly enough. We needed a better way to load-balance those TCP connections that were long-lived.

Of course, another major problem with scaling is the cost of monitoring as you scale. If you're having trouble scaling your monitoring reach out to us here at Hosted Graphite. Sign up for the free trial here!

Enter HAProxy

We decided to run HAProxy instances on each of our ingestion nodes to perform intra-cluster TCP load balancing. With this setup, we had a traffic path like so:

A diagram showing Hosted Graphite's Intra-cluster HAProxy Load Balancing
Intra-cluster haproxy load balancing.

The above image shows how the Intra-cluster HAProxy load balancing works. HAProxy load balancing enables the ingestion layer to receive messages from TCP and UDP and transmit them through TCP.

We began with experiments in round-robin and leastconn balancing methods, the latter balances new connections to a node with the current least active connections and offers the best results. At first, we were pretty happy with how this worked, but knew it would be impossible to truly balance the load across the cluster in this way, as not all connections are created equal (5 data-points/s accepted for one connection…5,000 data-points/s for another). So, for a short period, we rolled with this limited load balancing and continued to rely on DNS for balancing UDP.

Building an infrastructure border

At this point, we realised we desperately needed to find a way to effectively balance our UDP traffic across our ingestion layer, which was at the border of our infrastructure. Client-side DNS caching was hurting us, we had some customers sending huge amounts of UDP data to a subset of our ingestion layer. This was causing overload issues resulting in backlogged data, which the SRE team were frequently getting paged to investigate. Further digging into our options threw little light on the problem, until a fellow SRE, who was leading the project, decided to build a new load-balancing layer which would act as the infrastructure border. With a border layer of load balancers in place, we could clean out our ever-growing DNS records. It also allowed us to continue to load balance TCP connections with HAProxy, while UDP balancing would be handled by Linux Virtual Server (LVS). The LVS project is a high-performance software load balancer that has existed in the Linux kernel since 2001. Why we chose it for a UDP load-balancing solution is worthy of an entire blog post of its own!

Connecting the dots

We now had a border layer of load balancers that looked something like this:

A diagram showing Hosted Graphite's LVS and HAProxy Load Balancing Layer
Border layer of load balancers.

This new load balancing layer opened up another opportunity: to revisit our flawed balancing of TCP connections. As the engineer tasked with the job of achieving TCP traffic convergence, I started by looking at how we could use the tools we already had to balance highly variable traffic rates over long-lived TCP connections. Usefully, HAProxy provides a method for dynamically weighting backend servers via the UDS (Unix Domain Socket) API which allows for new connections to be balanced to nodes based on their weighting, but what should determine the weights? The obvious solution was to use our extensive service monitoring, which we had easy access to through our own Graphite API.

I implemented a new Python service to be run on each load balancer. This would handle the dynamic weighting of individual servers for a specific HAProxy backend based on the results of some measurement of load. In our case, the “measurement of load” was the overall TCP datapoint rate observed at each node in the ingestion layer – however,  this could be anything! loadavg?, network bandwidth?, disk space? (sure.. why not!).

Leveraging our Graphite render API, we could weight the forwarding of new connections to each node based on the results of a render query something like:

integral(tcp-service.node-x.datapoints.received)

(cool!).

How it works

So, how does this all work? The operation loop of a single weighter process is as follows:

  • On initialisation, using the included UDS wrapper, HAProxy_weighter fetches the weight currently set for each server present in the configured HAProxy backend.
  • For Every rebalance interval, the weighter renders the configured Graphite target to acquire some measurement of the load currently observed on each relevant server.
  • Weights are then calculated for each server by feeding the render results to one of two configurable weighting functions.
  • The calculated weights are then set in HAProxy, again using the UDS API.

A screenshot of a Hosted Graphite dashboard monitoring TCP traffic accepted at ingestion layer.
A screenshot of a Hosted Graphite panel monitoring connections balanced by haproxy.

Each weighting function takes a single data point for each server, aggregated from the render results using either the leading edge or the average of the whole dataset for the series, and returns a weight between 1 and 256. We currently have two options: normalised exponential distribution, and limited proportions which are in use for balancing our TCP connections to our ingestion layer. The weight is determined by calculating the inverse proportions for each value of a list of data points while limiting the minimum datapoint value to 1/LIMITING_FACTOR of the average, this avoids lower range outliers being assigned a maximum weight proportion. An AGGRESSIVENESS_FACTOR constant can be raised or lowered to modify the standard deviation between weightings.

If you're interested in what we used to visualize the data, that's the Hosted Grafana dashboards that are a part of our HostedGraphite offering. Sign on to our free trial and check it out!‍

A screenshot of a Hosted Graphite dashboard panel monitoring weights set at all loadbalancers for server-0028.
A screenshot of a Hosted Graphite dashboard panel monitoring weights set by haproxy_weighter at a loadbalancer for each ingestion layer server

Once the haproxy_weighter was eventually deployed, we could see a significant improvement. Our DNS records are smaller and much easier to manage. We no longer have outliers within our ingestion layer. Where before, we would have seen servers receiving 5 times the average traffic due to a couple of particularly traffic-heavy connections, we now have nicely even per-server traffic rates. TCP traffic convergence was achieved. This system also gives us the ability to automatically react quickly to traffic spikes, routing new connections away from overloaded servers, and saving our SREs from getting paged in the depths of the night.

A screenshot of a Hosted Graphite dashboard panel monitoring TCP traffic convergence
TCP traffic convergence achieved.

Check out these awesome Grafana graphs we're showing! If you want to have your systems visualized with graphs like this, sign on to the Hosted Graphite app! You can get a 14-day free trial here! Also, contact us directly and get us on video chat to talk about your monitoring system!

Conclusion

HAProxy load balancing is a critical part of your infrastructure. Scaling your proxies is a big challenge, a challenge that's even more difficult when you're flying blind. Just like we did in this article, you can use HostedGraphite to monitor your load balancers. Sign up for the Hosted Graphite free trial here, and start monitoring your data on our awesome Hosted Grafana dashboards.


Ciaran Gaffney

SRE at Hosted Graphite.

Related Posts

See why thousands of engineers trust Hosted Graphite with their monitoring

START A FREE TRIAL