Performance Tuning HAProxy

Development

Reading Time: 11 minutes

In a recent article, I covered how to tune the NGINX webserver for a simple static HTML page. In this article, we are going to once again explore those performance-tuning concepts and walk through some basic tuning options for HAProxy.

What is HAProxy

HAProxy is a software load balancer commonly used to distribute TCP-based traffic to multiple backend systems. It provides not only load balancing but also has the ability to detect unresponsive backend systems and reroute incoming traffic.

In a traditional IT infrastructure, load balancing is often performed by expensive hardware devices. In cloud and highly distributed infrastructure environments, there is a need to provide this same type of service while maintaining the elastic nature of cloud infrastructure. This is the type of environment where HAProxy shines, and it does so while maintaining a reputation for being extremely efficient out of the box.

Much like NGINX, HAProxy has quite a few parameters set for optimal performance out of the box. However, as with most things, we can still tune it for our specific environment to increase performance.

In this article, we are going to install and configure HAProxy to act as a load balancer for two NGINX instances serving a basic static HTML site. Once set up, we are going to take that configuration and tune it to gain even more performance out of HAProxy.

Installing HAProxy

For our purposes, we will be installing HAProxy on an Ubuntu system. The installation of HAProxy is fairly simple on an Ubuntu system. To accomplish this, we will use the Apt package manager; specifically we will be using the apt-get command.

# apt-get install haproxy
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  liblua5.3-0
Suggested packages:
  vim-haproxy haproxy-doc
The following NEW packages will be installed:
  haproxy liblua5.3-0
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 872 kB of archives.
After this operation, 1,997 kB of additional disk space will be used.
Do you want to continue? [Y/n] y

With the above complete, we now have HAProxy installed. The next step is to configure it to load balance across our backend NGINX instances.

Basic HAProxy Config

In order to set up HAProxy to load balance HTTP traffic across two backend systems, we will first need to modify HAProxy’s default configuration file /etc/haproxy/haproxy.cfg.

To get started, we will be setting up a basic frontend service within HAProxy. We will do this by appending the below configuration block.

frontend www
    bind               :80
    mode               http
    default_backend    bencane.com

Before going too far, let’s break down this configuration a bit to understand what exactly we are telling HAProxy to do.

In this section, we are defining a frontend service for HAProxy. This is essentially a frontend listener that will accept incoming traffic. The first parameter we define within this section is the bind parameter. This parameter is used to tell HAProxy what IP and Port to listen on; 0.0.0.0:80 in this case. This means our HAProxy instance will listen for traffic on port 80 and route it through this frontend service named www.

Within this section, we are also defining the type of traffic with the mode parameter. This parameter accepts tcp or http options. Since we will be load balancing HTTP traffic, we will use the http value. The last parameter we are defining is default_backend, which is used to define the backend service HAProxy should load balance to. In this case, we will use a value of bencane.com which will route traffic through our NGINX instances.

backend bencane.com
    mode     http
    balance  roundrobin
    server   nyc2 nyc2.bencane.com:80 check
    server   sfo1 sfo1.bencane.com:80 check

Like the frontend service, we will also need to define our backend service by appending the above configuration block to the same /etc/haproxy/haproxy.cfg file.

In this backend configuration block, we are defining the systems that HAProxy will load balance traffic to. Like the frontend section, this section also contains a mode parameter to define whether these are tcp or http backends. For this example, we will once again use http as our backend systems are a set of NGINX webservers.

In addition to the mode parameter, this section also has a parameter called balance. The balance parameter is used to define the load-balancing algorithm that determines which backend node each request should be sent to. For this initial step, we can simply set this value to roundrobin, which is used to send traffic evenly as it comes in. This setting is pretty common and often the first load balancer that users start with.

The final parameter in the backend service is server, which is used to define the backend system to balance to. In our example, there are two lines that each define a different server. These two servers are the NGINX webservers that we will load balancing traffic to in this example.

The format of the server line is a bit different than the other parameters. This is because node-specific settings can be configured via the server parameter. In the example above, we are defining a label, IP:Port, and whether or not a health check should be used to monitor the backend node.

By specifying check after the web-server’s address, we are defining that HAProxy should perform a health check to determine whether the backend system is responsive or not. If the backend system is not responsive, incoming traffic will not be routed to that backend system.

With the changes above, we now have a basic HAProxy instance configured to load balance an HTTP service. In order for these configurations to take effect however, we will need to restart the HAProxy instance. We can do that with the systemctl command.

# systemctl restart haproxy

Now that our configuration changes are in place, let’s go ahead and get started with establishing our baseline performance of HAProxy.

Baselining Our Performance

In the “Tuning NGINX for Performance” article, I discussed the importance of establishing a performance baseline before making any changes. By establishing a baseline performance before making any changes, we can identify whether or not the changes we make have a beneficial effect.

As in the previous article, we will be using the ApacheBench tool to measure the performance of our HAProxy instance. In this example however, we will be using the flag -c to change the number of concurrent HTTP sessions and the flag -n to specify the number of HTTP requests to make.

# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/
Requests per second:    97.47 [#/sec] (mean)
Time per request:       25649.424 [ms] (mean)
Time per request:       10.260 [ms] (mean, across all concurrent requests)

After running the ab (ApacheBench) tool, we can see that out of the box our HAProxy instance is servicing 97.47 HTTP requests per second. This metric will be our baseline measurement; we will be measuring any changes against this metric.

Setting the Maximum Number of Connections

One of the most common tunable parameters for HAProxy is the maxconn setting. This parameter defines the maximum number of connections the entire HAProxy instance will accept.

When calling the ab command above, I used the -c flag to tell ab to open 2500 concurrent HTTP sessions. By default, the maxconn parameter is set to 2000. This means that a default instance of HAProxy will start queuing HTTP sessions once it hits 2000 concurrent sessions. Since our test is launching 2500 sessions, this means that at any given time at least 500 HTTP sessions are being queued while 2000 are being serviced immediately. This certainly should have an effect on our throughput for HAProxy.

Let’s go ahead and raise this limit by once again editing the /etc/haproxy/haproxy.cfg file.

global
        maxconn         5000

Within the haproxy.cfg file, there is a global section; this section is used to modify “global” parameters for the entire HAProxy instance. By adding the maxconn setting above, we are increasing the maximum number of connections for the entire HAProxy instance to 5000, which should be plenty for our testing. In order for this change to take effect, we must once again restart the HAProxy instance using the systemctl command.

 # systemctl restart haproxy 

With HAProxy restarted, let’s run our test again.

# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/
Requests per second:    749.22 [#/sec] (mean)
Time per request:       3336.786 [ms] (mean)
Time per request:       1.335 [ms] (mean, across all concurrent requests)

In our baseline test, the Requests per second value was 97.47. After adjusting the maxconn parameter, the same test returned a Requests per second of 749.22. This is a huge improvement over our baseline test and just goes to show how important of a parameter the maxconn setting is.

When tuning HAProxy, it is very important to understand your target number of concurrent sessions per instance. By identifying and tuning this value upfront, you can save yourself a lot of troubleshooting with HAProxy performance during peak traffic load.

In this article, we set the maxconn value to 5000; however this is still a fairly low number for a high-traffic environment. As such, I would highly recommend identifying your desired number of concurrent sessions and tuning the maxconn parameter before changing any other parameter when tuning HAProxy.

Multiprocessing and CPU Pinning

Another interesting tunable for HAProxy is the nbproc parameter. By default, HAProxy has a single worker process, which means that all of our HTTP sessions will be load balanced by a single process. With the nbproc parameter, it is possible to create multiple worker processes to help distribute the workload internally.

While additional worker processes might sound good at first, they only tend to provide value when the server itself has more than 1 CPU. It is not uncommon for environments that create multiple worker processes on single CPU systems to see that HAProxy performs worse than it did as a single process instance. The reason for this is because the overhead of managing multiple worker processes provides a diminishing return when the number of workers exceeds the number of CPUs available.

With this in mind, it is recommended that the nbproc parameter should be set to match the number of CPUs available to the system. In order to tune this parameter for our environment, we first need to check how many CPUs are available. We can do this by executing the lshw command.

# lshw -short -class cpu
H/W path      Device  Class      Description
============================================
/0/401                processor  Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
/0/402                processor  Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz

From the output above, it appears that we have 2 available CPUs on our HAProxy server. Let’s go ahead and set the nbproc parameter to 2, which will tell HAProxy to start a second worker process on restart. We can do this by once again editing the global section of the /etc/haproxy/haproxy.cfg file.

global
        maxconn         5000
        nbproc          2
        cpu-map         1 0
        cpu-map         2 1

In the above HAProxy config example, I included another parameter named cpu-map. This parameter is used to pin a specific worker process to the specified CPU using CPU affinity. This allows the processes to better distribute the workload across multiple CPUs.

While this might not sound very critical at first, it is when you consider how Linux determines which CPU a process should use when it requires CPU time.

Understanding CPU Affinity

The Linux kernel internally has a concept called CPU affinity, which is where a process is pinned to a specific CPU for its CPU time. If we use our system above as an example, we have two CPUs (0 and 1), a single threaded HAProxy instance. Without any changes, our single worker process will be pinned to either 0 or 1.

If we were to enable a second worker process without specifying which CPU that process should have an affinity to, that process would default to the same CPU that the first worker was bound to.

The reason for this is due to how Linux handles CPU affinity of child processes. Unless told otherwise, a child process is always bound to the same CPU as the parent process in Linux. The reason for this is to allow processes to leverage the L1 and L2 caches available on the physical CPU. In most cases, this makes an application perform faster.

The downside to this can be seen in our example. If we enable two workers and both worker1 and worker2 were bound to CPU 0, the workers would constantly be competing for the same CPU time. By pinning the worker processes on different CPUs, we are able to better utilize all of our CPU time available to our system and reduce the amount of times our worker processes are waiting for CPU time.

In the configuration above, we are using cpu-map to define CPU affinity by pinning worker1 to CPU 0 and worker2 to CPU 1.

After making these changes, we can restart the HAProxy instance again and retest with the ab tool to see some significant improvements in performance.

# systemctl restart haproxy

With HAProxy restarted, let’s go ahead and rerun our test with the ab command.

# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/
Requests per second:    1185.97 [#/sec] (mean)
Time per request:       2302.093 [ms] (mean)
Time per request:       0.921 [ms] (mean, across all concurrent requests)

In our previous test run, we were able to get a Requests per second of 749.22. With this latest run, after increasing the number of worker processes, we were able to push the Requests per second to 1185.97, a sizable improvement.

Adjusting the Load Balancing Algorithm

The final adjustment we will make is not a traditional tuning parameter, but it still has an importance in the amount of HTTP sessions our HAProxy instance can process. The adjustment is the load balancing algorithm we have specified.

Earlier in this post, we specified the load balancing algorithm of roundrobin in our backend service. In this next step, we will be changing the balance parameter to static-rr by once again editing the /etc/haproxy/haproxy.cfg file.

backend bencane.com
    mode    http
    balance static-rr
    server  nyc2 nyc2.bencane.com:80 check
    server  sfo1 sfo1.bencane.com:80 check

The static-rr algorithm is a round robin algorithm very similar to the roundrobin algorithm, with the exception that it does not support dynamic weighting. This weighting mechanism allows HAProxy to select a preferred backend over others. Since static-rr doesn’t worry about dynamic weighting, it is slightly more efficient than the roundrobin algorithm (approximately 1 percent more efficient).

Let’s go ahead and test the impact of this change by restarting the HAProxy instance again and executing another ab test run.

# systemctl restart haproxy

With the service restarted, let’s go ahead and rerun our test.

# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/
Requests per second:    1460.29 [#/sec] (mean)
Time per request:       1711.993 [ms] (mean)
Time per request:       0.685 [ms] (mean, across all concurrent requests)

In this final test, we were able to increase our Requests per second metric to 1460.29, a sizable difference over the 1185.97 results from the previous run.

Summary

In the beginning of this article, our basic HAProxy instance was only able to service 97 HTTP requests per second. After increasing a maximum number of connections parameter, increasing the number of worker processes, and changing our load balancing algorithm, we were able to push our HAProxy instance to 1460 HTTP requests per second; an improvement of 1405 percent.

Even with such an increase in performance, there are still more tuning parameters available within HAProxy. While this article covered a few basic and unconventional parameters, we have still only scratched the surface of tuning HAProxy. For more tuning options, you can checkout HAProxy’s configuration guide.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.