Mixpanel Engineering

Real-time scaling

gevent: the Good, the Bad, the Ugly

with 14 comments

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

I’m not going to spend much time describing what gevent is. I think the one sentence overview from its web site does a better job than I could:

gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop.

What follows are my experiences using gevent for an internal project here at Mixpanel. I even whipped up some performance numbers specifically for this post!

The Good

The main draw of gevent is obviously performance, especially when compared with traditional threading solutions. At this point, it’s pretty much common knowledge that past a certain level of concurrency doing I/O asynchronously vastly outperforms synchronous I/O in separate threads.

What gevent adds is a programming interface that looks very much like traditional threaded programming, but underneath does asynchronous I/O. Even better, it does all of this transparently. You can continue to use normal python modules like urllib2 to make HTTP requests and they’ll use gevent instead of the normal blocking socket operations. There are some caveats, but I’ll get back to those later.

For now, here’s the kind of performance improvement you can expect:

Thoughts

 

  • Ignoring everything else, gevent outperforms a threaded solution (in this case paste), by a factor of 4.
  • The number of errors rises linearly with the number of concurrent connections in the threaded solution (these were all connection timeouts. I could probably have increased the timeout interval, but from a user perspective extremely long waits are just as bad as failures). gevent has no errors until 10,000 simultaneous connections, or at least until somewhere north of 5,000 simultaneous connections.
  • The actual requests completed per second were remarkably stable in both cases, at least until gevent fell apart in the 10,000 simultaneous connections test. I actually found this somewhat surprising. I initially guessed that requests per second would degrade at least a little bit as concurrency went up.
  • The 10,000 simultaneous connections threaded test failed completely. I could have probably gotten this to work (seemed like something that some more ulimit tweaking could have solved), but I was mostly doing the test for fun so I didn’t spend any time on it.
  • If this kind of thing interests you, we’re hiring. (Yeah, I just intermingled content and advertising.)

Methodology

Here’s the python code I used for both tests:

#!/usr/bin/env python

import sys

def serve_page(env, start_response):
    paragraph = '''

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
        eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
        minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
        ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
        voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
        sint occaecat cupidatat non proident, sunt in culpa qui officia
        deserunt mollit anim id est laborum.

    '''

    page = '''
<h1>Static Content</h1>
%s

    ''' % (paragraph * 10,)

    start_response('200 OK', [('Content-Type', 'text/html')])
    return [page]

if __name__ == '__main__':
    def usage():
        print 'usage:', sys.argv[0], 'gevent|threaded CONCURRENCY'
        sys.exit(1)

    if len(sys.argv) != 3 or sys.argv[1] not in ['gevent', 'threaded']:
        usage()

    try:
        concurrency = int(sys.argv[2])
    except ValueError:
        usage()

    if sys.argv[1] == 'gevent':
        from gevent import wsgi
        wsgi.WSGIServer(
            ('127.0.0.1', 10001),
            serve_page,
            log=None,
            spawn=concurrency
        ).serve_forever()
    else:
        from paste import httpserver
        httpserver.serve(
            serve_page,
            host='127.0.0.1',
            port='10001',
            use_threadpool=True,
            threadpool_workers=concurrency
        )

For the client, I used Apache bench with the following options:

  • -c NUM: where NUM is the number of simultaneous connections. This matched the number used on the server command line in each test.
  • -n 100000: all tests were over 100,000 requests. In the graph above, errors are not a rate, but rather the actual number of failed requests out of 100,000.
  • -r: continue even if there is a failure.

All tests were done with both client and server running on the same low-end, 512MB Rackspace Cloud VPS. I initially thought I would need some way to limit the threaded solution to one CPU, but it turns out even though there are “four” cores on the VPS, you’re limited to 100% of one core. Not impressed.

Linux tweaks for load testing

I ran into a whole host of issues getting Linux working past ~500 connections per second. Almost all of these are related to all the connections being between the same two IP addresses (127.0.0.1 <-> 127.0.0.1). In other words, you probably wouldn’t see any of these problems in production, but almost certainly would in a test environment (except maybe if you’re running behind a single proxy).

  • Increase the client port rangeecho -e '1024t65535' | sudo tee /proc/sys/net/ipv4/ip_local_port_range

    This increases the number of available ports to use for client connections. You’ll run out of ports very quickly without this (they get stuck in TIME_WAIT).

  • Enable TIME_WAIT recyclingecho 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_recycle

    This helps with connections stuck in TIME_WAIT as well and is basically required past a certain number of connections per second at least if the IP address pair remains the same. There’s another option tcp_tw_reuse that is available as well, but I didn’t need to use it.

  • Disable syncookiesecho 1 | sudo tee /proc/sys/net/ipv4/tcp_syncookies

    If you see “possible SYN flooding on port 10001. Sending cookies.” in dmesg, you probably need to disable tcp_syncookies. Don’t do this on your production server, but for testing it doesn’t matter and it can cause connection resets.

  • Disable iptables if you’re using connection trackingYou’ll quickly fill up the netfilter connection table. Alternatively, you try increasing /proc/sys/net/netfilter/nf_conntrack_max, but I think it’s easier just to disable the firewall while testing.
  • Raise open file descriptor limitsAt least on Ubuntu, the open files limit for normal users defaults to 4096. So, if you wan to test with more than ~4000 simultaneous connections you need to bump this up. The easiest way is to add a line to /etc/security/limits.conf like “* hard nofile 16384” and then run ulimit -n 16384 before running your tests.

The Bad

It can’t be all good, right? Right. Actually, most of the problems I had with gevent could be solved with better, more thorough documentation, which leads me to:

Documentation

Simply put: it’s not good. I probably read more gevent source code than I did gevent documentation (and it was more useful!). The best documentation is actually in the examples directory in the source tree. If you have a question, look there first — seriously. I also spent more time googling through mailing list archives than I like to.

Incompatibilities

I’m specifically talking about eventlet here. In retrospect, this makes sense, but it can lead to some baffling failures. We had some MongoDB client code that was using eventlet. It simply didn’t work from the server process I was working on using gevent.

Order matters. Ugh.

Daemonize before you import gevent or at least before you call monkey.patch_all(). I didn’t look into this deeply, but what I gathered from a mailing list post or two is that gevent modifies a socket in python internals. When you daemonize, all open file descriptors are closed, so in children, the socket will be recreated in its unmodified form, which of course doesn’t work right with gevent. gevent should handle this type of thing or at least provide a daemonize function that is compatible.

Monkey patching. Sometimes?

So, most operations are patched by executing monkey.patch_all(). I’m not a huge fan of doing this sort of thing, but it is nice that normal python modules continue to function. Bizarrely, though, not everything is patched. I spent a while trying to figure out why signals weren’t working until I found gevent.signal. If you’re going to patch some functions, why not patch them all?

The same applies to gevent.queue vs. standard python queue. Overall, it needs to be clearer (as in a simple list) when you need to use gevent specific API’s versus standard modules/classes/functions.

The Ugly

gevent has no built in support for multiprocessing. This is much more a deployment issue than anything else, but it does mean that to fully utilize multiple cores, you’re going to need to run multiple daemon processes on multiple ports. Then, most likely, you’re going to need to run something like nginx (at least if you’re serving HTTP requests) to distribute requests among the server processes.

Really, the lack of multiprocessing capability just means another abstraction layer on your server that you might have added anyway for availability.

It’s a bigger issue when using gevent for client load testing. I ended up implementing a multiprocess load client that used shared memory to aggregate and print statistics. It was a lot more work than it should have been. (If anyone’s doing something similar, ping me and I can send the shell of the client program.)

Last Words

If you’ve gotten this far, you noticed that I spent two full sections on negative aspects of gevent. Don’t let that fool you though. I’m convinced that gevent is a great solution for high performance python networking. There are problems, but mostly they’re problems with documentation, which will only improve with time.

We’re using gevent internally. In fact, our server is so efficient that we’ll easily run out of bandwidth resources before computing resources (both processors and memory) for the VPS size we’re using.

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

Written by Avery Fay

October 29th, 2010 at 11:07 am

Posted in Backend,python

14 Responses to 'gevent: the Good, the Bad, the Ugly'

Subscribe to comments with RSS or TrackBack to 'gevent: the Good, the Bad, the Ugly'.

  1. re: iptables connection tracking.
    You can disable connection tracking over the loopback interface. Not sure why this isn’t the default to be honest, but I suppose it is a distribution shipping issue.
    http://awesometrousers.posterous.com/conntrack-loopback-blues

    re: multiprocessing.
    Give gunicorn a look. It seems to provide multiprocessing (N instance behind a single bind with fork/accept mechanism) on top of gevent, with some handy daemonize functionality too.
    http://gunicorn.org/

    cactus

    29 Oct 10 at 11:32 am

  2. For multiprocessing, like say cactus you can have a look on gunicorn (http://gunicorn.org) and the gevent workers. There are 3 gevents workers, one using our parser, and the other 2 using gevent.pywsgi or gevent.wsgi.

    Also gevent-websocket allows you to handle websockets using gevent with gunicorn: http://www.gelens.org/code/gevent-websocket/ .

    Feel free to contact me if you have any questions :)

    benoitc

    29 Oct 10 at 11:58 am

  3. The following line should fix your issue with daemonizing

    gevent.reinit()

    Andrew

    29 Oct 10 at 9:22 pm

  4. [...] gevent: the Good, the Bad, the Ugly at Mixpanel Engineering [...]

  5. greenlet is slow. The way it swaps out the functions, and yields is done inefficiently and can be done with less work.

    But have you considered using tornado as an alternative to coroutines?

    Also don’t forget to reduce your recv/send buffer for testing because you’ll run out of buffer space/tcb later and this might go as a silent error that would show up as a performance degradation in httperf or whatever tool you are using .

    Hasan Alayli

    30 Oct 10 at 12:19 am

  6. Shouldn’t the disabling of syn cookies be

    echo 0 | sudo tee /proc/sys/net/ipv4/tcp_syncookies

    It seems you are enabling them in the example.

    Chris Hoffman

    4 Nov 10 at 11:41 am

  7. Actually piece of content. I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore I hope you review ever again rapidly.

    Ila Makey

    22 Nov 10 at 9:53 pm

  8. I feel the need to tell you that you are hereby awarded two thumbs up for your efforts.

    Robb

    4 Dec 10 at 8:00 am

  9. [...] 这是一篇翻译的文章,原文见http://code.mixpanel.com/gevent-the-good-the-bad-the-ugly/。 [...]

  10. gunicorn is awesome for fast clients, but suffers from unicorn’s Achilles’ heel. Choices: greenlets, an upstream proxy or potentially port Rainbows! to gunicorn.

    Barry Allard

    30 Apr 11 at 6:05 am

  11. [...] The other popular one is gevent. Avery wrote about some of the tradeoffs earlier: http://code.mixpanel.com/2010/10/29/gevent-the-good-the-bad-the-ugly/ If you're interested in what we work on, please apply – we're hiring: [...]

  12. With the release of gevent 1.0alpha a number of long-standing issues were fixed, including at least a couple from your list:

    - regular signal.signal() function now works with gevent, even without monkey patching.

    - dns resolver no longer breaks after fork(). In fact, we use c-ares instead of libevent-dns, which is much better.

    Denis

    6 Aug 11 at 9:04 pm

  13. I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore

    denf

    5 Sep 11 at 9:22 am

  14. Maybe it’s a mistype
    [orig]
    Increase the client port rangeecho -e ’1024t65535′
    [hmm]
    Increase the client port range $echo -e ’1024\t65535′

    Niemi

    5 Oct 11 at 9:58 pm

Leave a Reply