gevent: the Good, the Bad, the Ugly

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

I’m not going to spend much time describing what gevent is. I think the one sentence overview from its web site does a better job than I could:

gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop.

What follows are my experiences using gevent for an internal project here at Mixpanel. I even whipped up some performance numbers specifically for this post!

The Good

The main draw of gevent is obviously performance, especially when compared with traditional threading solutions. At this point, it’s pretty much common knowledge that past a certain level of concurrency doing I/O asynchronously vastly outperforms synchronous I/O in separate threads.

What gevent adds is a programming interface that looks very much like traditional threaded programming, but underneath does asynchronous I/O. Even better, it does all of this transparently. You can continue to use normal python modules like urllib2 to make HTTP requests and they’ll use gevent instead of the normal blocking socket operations. There are some caveats, but I’ll get back to those later.

For now, here’s the kind of performance improvement you can expect:

Thoughts

 

  • Ignoring everything else, gevent outperforms a threaded solution (in this case paste), by a factor of 4.
  • The number of errors rises linearly with the number of concurrent connections in the threaded solution (these were all connection timeouts. I could probably have increased the timeout interval, but from a user perspective extremely long waits are just as bad as failures). gevent has no errors until 10,000 simultaneous connections, or at least until somewhere north of 5,000 simultaneous connections.
  • The actual requests completed per second were remarkably stable in both cases, at least until gevent fell apart in the 10,000 simultaneous connections test. I actually found this somewhat surprising. I initially guessed that requests per second would degrade at least a little bit as concurrency went up.
  • The 10,000 simultaneous connections threaded test failed completely. I could have probably gotten this to work (seemed like something that some more ulimit tweaking could have solved), but I was mostly doing the test for fun so I didn’t spend any time on it.
  • If this kind of thing interests you, we’re hiring. (Yeah, I just intermingled content and advertising.)

Methodology

Here’s the python code I used for both tests:

For the client, I used Apache bench with the following options:

  • -c NUM: where NUM is the number of simultaneous connections. This matched the number used on the server command line in each test.
  • -n 100000: all tests were over 100,000 requests. In the graph above, errors are not a rate, but rather the actual number of failed requests out of 100,000.
  • -r: continue even if there is a failure.

All tests were done with both client and server running on the same low-end, 512MB Rackspace Cloud VPS. I initially thought I would need some way to limit the threaded solution to one CPU, but it turns out even though there are “four” cores on the VPS, you’re limited to 100% of one core. Not impressed.

Linux tweaks for load testing

I ran into a whole host of issues getting Linux working past ~500 connections per second. Almost all of these are related to all the connections being between the same two IP addresses (127.0.0.1 <-> 127.0.0.1). In other words, you probably wouldn’t see any of these problems in production, but almost certainly would in a test environment (except maybe if you’re running behind a single proxy).

  • Increase the client port rangeecho -e '1024t65535' | sudo tee /proc/sys/net/ipv4/ip_local_port_range

    This increases the number of available ports to use for client connections. You’ll run out of ports very quickly without this (they get stuck in TIME_WAIT).

  • Enable TIME_WAIT recyclingecho 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_recycle

    This helps with connections stuck in TIME_WAIT as well and is basically required past a certain number of connections per second at least if the IP address pair remains the same. There’s another option tcp_tw_reuse that is available as well, but I didn’t need to use it.

  • Disable syncookiesecho 1 | sudo tee /proc/sys/net/ipv4/tcp_syncookies

    If you see “possible SYN flooding on port 10001. Sending cookies.” in dmesg, you probably need to disable tcp_syncookies. Don’t do this on your production server, but for testing it doesn’t matter and it can cause connection resets.

  • Disable iptables if you’re using connection trackingYou’ll quickly fill up the netfilter connection table. Alternatively, you try increasing /proc/sys/net/netfilter/nf_conntrack_max, but I think it’s easier just to disable the firewall while testing.
  • Raise open file descriptor limitsAt least on Ubuntu, the open files limit for normal users defaults to 4096. So, if you wan to test with more than ~4000 simultaneous connections you need to bump this up. The easiest way is to add a line to /etc/security/limits.conf like “* hard nofile 16384” and then run ulimit -n 16384 before running your tests.

The Bad

It can’t be all good, right? Right. Actually, most of the problems I had with gevent could be solved with better, more thorough documentation, which leads me to:

Documentation

Simply put: it’s not good. I probably read more gevent source code than I did gevent documentation (and it was more useful!). The best documentation is actually in the examples directory in the source tree. If you have a question, look there first — seriously. I also spent more time googling through mailing list archives than I like to.

Incompatibilities

I’m specifically talking about eventlet here. In retrospect, this makes sense, but it can lead to some baffling failures. We had some MongoDB client code that was using eventlet. It simply didn’t work from the server process I was working on using gevent.

Order matters. Ugh.

Daemonize before you import gevent or at least before you call monkey.patch_all(). I didn’t look into this deeply, but what I gathered from a mailing list post or two is that gevent modifies a socket in python internals. When you daemonize, all open file descriptors are closed, so in children, the socket will be recreated in its unmodified form, which of course doesn’t work right with gevent. gevent should handle this type of thing or at least provide a daemonize function that is compatible.

Monkey patching. Sometimes?

So, most operations are patched by executing monkey.patch_all(). I’m not a huge fan of doing this sort of thing, but it is nice that normal python modules continue to function. Bizarrely, though, not everything is patched. I spent a while trying to figure out why signals weren’t working until I found gevent.signal. If you’re going to patch some functions, why not patch them all?

The same applies to gevent.queue vs. standard python queue. Overall, it needs to be clearer (as in a simple list) when you need to use gevent specific API’s versus standard modules/classes/functions.

The Ugly

gevent has no built in support for multiprocessing. This is much more a deployment issue than anything else, but it does mean that to fully utilize multiple cores, you’re going to need to run multiple daemon processes on multiple ports. Then, most likely, you’re going to need to run something like nginx (at least if you’re serving HTTP requests) to distribute requests among the server processes.

Really, the lack of multiprocessing capability just means another abstraction layer on your server that you might have added anyway for availability.

It’s a bigger issue when using gevent for client load testing. I ended up implementing a multiprocess load client that used shared memory to aggregate and print statistics. It was a lot more work than it should have been. (If anyone’s doing something similar, ping me and I can send the shell of the client program.)

Last Words

If you’ve gotten this far, you noticed that I spent two full sections on negative aspects of gevent. Don’t let that fool you though. I’m convinced that gevent is a great solution for high performance python networking. There are problems, but mostly they’re problems with documentation, which will only improve with time.

We’re using gevent internally. In fact, our server is so efficient that we’ll easily run out of bandwidth resources before computing resources (both processors and memory) for the VPS size we’re using.

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

14 thoughts on “gevent: the Good, the Bad, the Ugly

  1. cactus

    re: iptables connection tracking.
    You can disable connection tracking over the loopback interface. Not sure why this isn’t the default to be honest, but I suppose it is a distribution shipping issue.
    http://awesometrousers.posterous.com/conntrack-loopback-blues

    re: multiprocessing.
    Give gunicorn a look. It seems to provide multiprocessing (N instance behind a single bind with fork/accept mechanism) on top of gevent, with some handy daemonize functionality too.
    http://gunicorn.org/

    Reply
  2. Pingback: Legitimate Work At Home Employment Opportunities » Charlotte Web Design – Ugly Doesn’t Sell

  3. Hasan Alayli

    greenlet is slow. The way it swaps out the functions, and yields is done inefficiently and can be done with less work.

    But have you considered using tornado as an alternative to coroutines?

    Also don’t forget to reduce your recv/send buffer for testing because you’ll run out of buffer space/tcb later and this might go as a silent error that would show up as a performance degradation in httperf or whatever tool you are using .

    Reply
  4. Ila Makey

    Actually piece of content. I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore I hope you review ever again rapidly.

    Reply
  5. Pingback: gevent: the Good, the Bad, the Ugly | In the Milky way

  6. Barry Allard

    gunicorn is awesome for fast clients, but suffers from unicorn’s Achilles’ heel. Choices: greenlets, an upstream proxy or potentially port Rainbows! to gunicorn.

    Reply
  7. Pingback: How and Why We Switched from Erlang to Python at Mixpanel Engineering

  8. Denis

    With the release of gevent 1.0alpha a number of long-standing issues were fixed, including at least a couple from your list:

    – regular signal.signal() function now works with gevent, even without monkey patching.

    – dns resolver no longer breaks after fork(). In fact, we use c-ares instead of libevent-dns, which is much better.

    Reply
  9. denf

    I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore

    Reply
  10. Niemi

    Maybe it’s a mistype
    [orig]
    Increase the client port rangeecho -e ‘1024t65535’
    [hmm]
    Increase the client port range $echo -e ‘1024\t65535’

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *