Handling Hundreds of Thousands of Concurrent HTTP Connections on AWS

By Ivaylo Vrabchev, Cloud Services Consultant at HeleCloud™


In the era of Cloud computing, we need to be able to design highly-available, fault-tolerant, cost-efficient, scalable systems. Some of these systems are heavily loaded with thousands or even millions of requests per second. Most of the Cloud Load Balancers are designed to handle such loads, but they scale up gradually, in line with traffic. So what would happen if all requests were received at once or within a few seconds’ time windows? There would be a huge spike, which the standard Cloud Load Balancers will not be able to handle.

Let’s imagine that we are AWS Cloud Architects who have to provide a simple Web server solution to handle more than 100,000 concurrent HTTP connections.

Web Server Solution Design

Graphic 1: Web Server Solution Design

The components, which comprise the architecture are:

  1. VPC
  2. DMZ/Private Subnets
  3. Elastic LoadBalancer
  4. EC2 Instances
  5. CloudWatch

VPC and Subnets

In this architecture, will be using a standard VPC configuration with two DMZ and two private subnets situated in two availability zones. DMZ will be utilised only for publicly facing services e.g. Load Balancers, Bastion Hosts etc. Private Subnets will be used to host all other services e.g. FrontEnd, Backend servers.

Load Balancer

We need to select which type of load balancer we are going to use. As described earlier, the standard Application Load Balancer (ALB) will not be able to handle such spikes, due to gradual traffic scalability. They can be pre-warmed but this is a manual task that requires assistance from the AWS Support team.

As the leading public Cloud provider, AWS is always there to help. They identified this problem and released Network Load Balancer (NLB). It is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. NLB operates at the connection layer (OSI Layer 4), routing connections to targets based on IP protocol data. That means we are responsible to configure properly the rest of the communication to Layer 7, in order to provide a fully operational HTTP application.

EC2 Instance

In our case, there are no specific requirements for the underlying operating system, so I highly recommend using Amazon Linux. It is optimised for the AWS Platform. Horizontal scaling is always better than vertical if your goal is to create a highly-available and fault-tolerant system.

Many connections would result in high CPU usage. That’s why our preference here is to use a CPU optimised instance type like the C5, the next generation of the Amazon EC2 Compute Optimized instance family.

Adding a few lines in /etc/sysctl.conf and /etc/security/limits.conf will prepare our OS to handle all these connections:


net.core.somaxconn = 65536
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_max_syn_backlog = 3240000


soft nofile 4096
hard nofile 100000

NGINX has been known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. That’s why it helps us avoid this situation and will serve the rest of the communication from L4 to L7 for us. So we need to fine-tune it with the following records in the nginx.conf:


worker_rlimit_nofile 100000;


worker_processes auto;

events {

worker_connections 25000;
use epoll;

multi_accept on;


error_log /var/log/nginx/error.log buffer=1024k;

access_log  /var/log/nginx/access.log main buffer=1024k;


However, using NLB instead of ALB has some shortcomings.NLB doesn’t support logging, so the full HTTP communication logs are stored on each nginx instances. In order to keep all logs in one centralised place, we can use Cloud Watch Logs and its agent to export all local files into a CloudWatch log group.


The installation and configuration are very simple:

sudo yum install -y awslogs


Once the agent is installed we have to configure it to export the desired logs by adding following lines to /etc/awslogs/awslogs.conf:

datetime_format = %b %d %H:%M:%S
file = /var/log/nginx/error.log
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = /var/log/nginx/error
datetime_format = %b %d %H:%M:%S
file = /var/log/nginx/access.log
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = /var/log/nginx/access

If we want to use CloudWatch within the same region as the instances, we need to configure the agent region:

REGION=`curl|grepregion|awk -F\” ‘{print $4}’`
sudo sed -i -e “/region =/ s/= .*/= ${REGION}/” /etc/awslogs/awscli.conf

Then we need to start the agent:

sudo service awslogs start

After a few seconds, the two new log groups will be created and we can find all Nginx logs in them.


Now, once you know how to do a fine-tuning on the configuration, and what type of AWS ELB to select, you can bring up a test environment using the following configuration:

1 x Amazon Load Balancer (NLB)
1 x Target Group with 2 x ec2 instances (c5.xlarge)

When all components are up and running, we are ready to validate if our configuration meets the initial requirements. That can be done using various open-source load testing tools like Bees with Machine Guns, Locust, Apache JMeteretc. There is an interesting article provided by Blezemeter comparing different load testing tools.

I will use ApacheBench (AB) for the purpose of this testing because I have already an automated solution for that.

The following command will be executed from 6 nodes in order to perform the load testing.

ab -k -n 3000000 -c 20000 {NLB URL address}

That will run 3000000 get requests processing up to 20000 requests concurrently per each individual server against the Network Load Balancer URL address.
Once all the requests are sent the test is completed. Because of the nature of the tool that we are using, we will have results per server similar to presented in the table below.

As you can see there is no failed request, which proves that our configuration can handle hundreds of thousands concurrent requests per seconds.


The AWS Network Load Balancer allows you to design your system architecture at a low and performant networking level while helping you to handle millions of requests per seconds. It is very useful when you have to handle unpredictable spikes in network traffic. AWS allows you to design the system in a few different ways based on your requirements and all best practices provided by their team.

Please be aware that Application Load Balancer now supports containers and lambda function in order to help serverless architectures, functionality announced at re:Invent 2018. A similar assessment can be done on that front as well. Follow us for further publications.

Please get in touch if we can help with AWS solution designs or any other aspect of the AWS platform.