... wherein I bloviate discursively

Brian Clapper,

Scala and Python: An informal TCP performance benchmark

| Comments


I’ve been using Python in a large-scale, high-throughput, high-availability network application. Scalability is an issue: We ultimately have to be able to process a large number of requests a second. The JVM seems easier to scale than CPython, at least for what we’re doing; it has real threads, for instance, instead of the crippled threading in CPython. But our code base is already large, it’s almost entirely Python, and it makes heavy use of the Twisted Python libraries.

This article describes a small series of benchmarks I ran, to test how many requests per second I could process using deliberately naive servers written in Python and Scala. I chose Scala, rather than Java, because:

  • Like Java, Scala compiles directly to JVM byte code.
  • The Scala language is, in my opinion, a much better language than Java. (Having written Java exclusively for nearly nine years, I have some experience on which to base that assessment.)
  • It’s easier and faster to write concurrent programs using the Scala Actor library than it is to use the java.util.concurrent library.

These test programs are deliberately simple-minded. They accept incoming socket connections and send back canned HTTP results.

The Programs

I wrote six programs:

  • A Scala server that dispatches each incoming connection to a pool of actors.
  • A simple Python TCP server, written using the popular Twisted Python framework’s twisted.internet API.
  • A Twisted-based Python web server written using Twisted’s twisted.web framework, which provides a complete web server framework.
  • A Python TCP server that uses a Python Actor library called dramatis. The dramatis framework is still alpha-quality, so its inclusion here is dubious. However, I included it primarily because it allowed for an easy translation from the Scala Actor-based server to a Python Actor-based server.
  • A Python server that uses the standard Python SocketServer API.
  • A Python server that uses fapws3, a web server framework based on libev.

The Twisted servers both use a special ReactorFinder module that attempts to find the best Twisted reactor module for the platform. For Linux, where I ran my tests, that’s the epoll reactor.

The programs are deliberately unoptimized. I’m sure I could get better performance by profiling and optimizing each one, but my goal was to see which server worked better without tuning. The pre-tuning benchmarks for these servers provide an interesting basis for comparing the platforms.

Of course, I’m sure plenty of people will find fault with this comparison. If you’re one of them, then feel free to comment on this article. (Try to keep it civil.) Or, better yet, do your own comparisons and publish your results.

You can download the programs from the following links:

The Test Environment

I ran these tests on a Dell Vostro 1700 laptop with two 2.2GHz Intel CPUs and 3Gb of RAM. The laptop runs Ubuntu 8.10 (Intrepid Ibex). I used Python 2.5.2, Scala 2.7.3, and the Sun Java 1.6.0_10-b33 runtime.

For each server, I used the ApacheBench HTTP benchmarking tool, running it as follows:

ab -c 1000 -n 100000 http://localhost:9999/

That is, it issued 100,000 HTTP “GET” requests, 1,000 requests at a time, using the loopback device. I did not specify -k (KeepAlive), because most of our production servers are using HTTP between servers, and the servers making the requests aren’t using KeepAlive. (Besides, most of my test servers don’t honor KeepAlive.)

Before running each test, I ran a few thousand requests through the server before running ab. For the Scala server, this “priming of the pump” allows the HotSpot compiler to profile and optimize the running server. For the Python servers, it likely does nothing at all.

The Results

Here are some of the data from the ab runs, ranked from most requests/second to least.

Server Mean requests per second Time per request (ms) Percentage of requests served within a certain time (ms)
50% 75% 90% 100% (longest)
Scala server 6,220 160.8 81 119 152 9,087
fapws3-based server 5,733 174.4 20 20 22 16,644
Python `SocketServer`-based server 4,761 202.2 1 1 2 15,819
Twisted Python TCP server 3,173 315.1 39 51 53 22,673
twisted.web` server 1,727 578.9 83 93 94 45,111
Python actor-based server 1,290 776.3 543 1700 841 93,648

From these results, the JVM seems to be the clear winner. fapws3 is the next-fastest server, which is no surprise, since the largest part of the fapws3 package is written in C and uses epoll and libev. The Twisted-based servers are surprisingly slow in comparison–which is a shame, since that’s what we’re currently using. However, moving to fapws3 should not be too difficult.

Ultimately, I’d like to be using Scala: It’s fast, it’s type-safe. and it runs on the JVM (where Hotspot kicks butt). But given an already large Python code base, fapws3 is looking promising.


More Results

50 Requests/Second

On the theory that 1,000 requests per second might be introducing contention, I ran the same tests with 50 requests per second. That is, I used the following ab command line:

ab -c 50 -n 100000 http://localhost:9999/

The results were interesting:

Server Mean requests per second Time per request (ms) Percentage of requests served within a certain time (ms)
50% 75% 90% 100% (longest)
Scala server 7,258 6.9 6 7 8 172
fapws3-based server 6,132 8.2 8 8 12 39
Python `SocketServer`-based server 4,762 10.5 1 1 2 20,997
Twisted Python TCP server 3,230 15.5 16 51 16 61
twisted.web` server 1,732 28.9 30 30 30 77
Python actor-based server 816 61.2 56 84 93 3,310`

For the Scala and fapws3 servers, reducing the number of concurrent connections to 50 increased the total requests/second served. For the Python Actor-based server, it reduced the total. The change had a negligible effect on the other servers.