If you’re benchmarking a web server using tools like
siege you may encounter an issue that could skew your results – hanging the connection pool. To illustrate this point, let’s look at a couple of benchmarks using
The first makes a total of 16,300 requests:
1 2 3 4 5 6 7 8 9 10 11 12
The second makes a total of 16,384 requests:
1 2 3 4 5 6 7 8 9 10 11 12
The second reports the server to be about 50% slower than the first!
If the processes are monitored during the second test, both
siege and the server process spin up to full capacity for an extended duration as one would expect, but at some point they become totally idle for about 15 seconds before kicking back into action to finish the test.
Ephemeral Port Range
To understand what is happening, we have to look at how
TCP connections are handled by the operating system. Whenever a connection is made between a client and server, the system binds that connection to an ephemeral port – a set of ports specified at the high end of the valid port range. This is how to reveal what the ephemeral port range is on your system:
1 2 3
The total number of ephemeral ports available on
OS X is 16,383 ( on Linux it is usually 28,232 – it is possible to increase the ephemeral port range on OSX ). You might think that this should be more than enough to run our benchmarks since we have only 125 simultanious connections occuring at any given time. However, when one of these ports is closed it does not become immediately available for a new request.
TCP Connection States
During the lifetime of a request, each port goes through a series of states, from
SYN_SENT when establishing a connection to
ESTABLISHED when communication is actively happening, through a series of closing states eventually culminating in
TIME_WAIT after the port has been closed.
TIME_WAIT the port is held in limbo to ensure any remaining packets are not erroniously provided to a fresh connection. ( Check the current state of ports in use by running
netstat -p tcp, get a full overview of the states in the
man netstat text )
The duration of the
TIME_WAIT state is the Maximum Segment Lifetime and is defined in
net.inet.tcp.msl. We can check what it is:
15 seconds. Bingo! There’s the slowdown we’ve had skewing our results.
Note that this limitation does not affect real-world requests to a live server because each
TCP connection is defined by the tuple of
destination IP and
destination port – so the ephemeral port limit only applies to a single client / server pair.
It’s possible to reconfigure your kernel to allow a lot more requests, see Richard Jones’s A Million User Comet Application with Mochiweb Part I.