gevent: the Good, the Bad, the Ugly
I’m not going to spend much time describing what gevent is. I think the one sentence overview from its web site does a better job than I could:
gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop.
What follows are my experiences using gevent for an internal project here at Mixpanel. I even whipped up some performance numbers specifically for this post!
The Good
The main draw of gevent is obviously performance, especially when compared with traditional threading solutions. At this point, it’s pretty much common knowledge that past a certain level of concurrency doing I/O asynchronously vastly outperforms synchronous I/O in separate threads.
What gevent adds is a programming interface that looks very much like traditional threaded programming, but underneath does asynchronous I/O. Even better, it does all of this transparently. You can continue to use normal python modules like urllib2 to make HTTP requests and they’ll use gevent instead of the normal blocking socket operations. There are some caveats, but I’ll get back to those later.
For now, here’s the kind of performance improvement you can expect:
Thoughts
- Ignoring everything else, gevent outperforms a threaded solution (in this case paste), by a factor of 4.
- The number of errors rises linearly with the number of concurrent connections in the threaded solution (these were all connection timeouts. I could probably have increased the timeout interval, but from a user perspective extremely long waits are just as bad as failures). gevent has no errors until 10,000 simultaneous connections, or at least until somewhere north of 5,000 simultaneous connections.
- The actual requests completed per second were remarkably stable in both cases, at least until gevent fell apart in the 10,000 simultaneous connections test. I actually found this somewhat surprising. I initially guessed that requests per second would degrade at least a little bit as concurrency went up.
- The 10,000 simultaneous connections threaded test failed completely. I could have probably gotten this to work (seemed like something that some more ulimit tweaking could have solved), but I was mostly doing the test for fun so I didn’t spend any time on it.
- If this kind of thing interests you, we’re hiring. (Yeah, I just intermingled content and advertising.)
Methodology
Here’s the python code I used for both tests:
#!/usr/bin/env python
import sys
def serve_page(env, start_response):
paragraph = '''
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
'''
page = '''
<h1>Static Content</h1>
%s
''' % (paragraph * 10,)
start_response('200 OK', [('Content-Type', 'text/html')])
return [page]
if __name__ == '__main__':
def usage():
print 'usage:', sys.argv[0], 'gevent|threaded CONCURRENCY'
sys.exit(1)
if len(sys.argv) != 3 or sys.argv[1] not in ['gevent', 'threaded']:
usage()
try:
concurrency = int(sys.argv[2])
except ValueError:
usage()
if sys.argv[1] == 'gevent':
from gevent import wsgi
wsgi.WSGIServer(
('127.0.0.1', 10001),
serve_page,
log=None,
spawn=concurrency
).serve_forever()
else:
from paste import httpserver
httpserver.serve(
serve_page,
host='127.0.0.1',
port='10001',
use_threadpool=True,
threadpool_workers=concurrency
)
For the client, I used Apache bench with the following options:
- -c NUM: where NUM is the number of simultaneous connections. This matched the number used on the server command line in each test.
- -n 100000: all tests were over 100,000 requests. In the graph above, errors are not a rate, but rather the actual number of failed requests out of 100,000.
- -r: continue even if there is a failure.
All tests were done with both client and server running on the same low-end, 512MB Rackspace Cloud VPS. I initially thought I would need some way to limit the threaded solution to one CPU, but it turns out even though there are “four” cores on the VPS, you’re limited to 100% of one core. Not impressed.
Linux tweaks for load testing
I ran into a whole host of issues getting Linux working past ~500 connections per second. Almost all of these are related to all the connections being between the same two IP addresses (127.0.0.1 <-> 127.0.0.1). In other words, you probably wouldn’t see any of these problems in production, but almost certainly would in a test environment (except maybe if you’re running behind a single proxy).
- Increase the client port range
echo -e '1024t65535' | sudo tee /proc/sys/net/ipv4/ip_local_port_rangeThis increases the number of available ports to use for client connections. You’ll run out of ports very quickly without this (they get stuck in TIME_WAIT).
- Enable TIME_WAIT recycling
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_recycleThis helps with connections stuck in TIME_WAIT as well and is basically required past a certain number of connections per second at least if the IP address pair remains the same. There’s another option tcp_tw_reuse that is available as well, but I didn’t need to use it.
- Disable syncookies
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_syncookiesIf you see “possible SYN flooding on port 10001. Sending cookies.” in dmesg, you probably need to disable tcp_syncookies. Don’t do this on your production server, but for testing it doesn’t matter and it can cause connection resets.
- Disable iptables if you’re using connection trackingYou’ll quickly fill up the netfilter connection table. Alternatively, you try increasing
/proc/sys/net/netfilter/nf_conntrack_max, but I think it’s easier just to disable the firewall while testing. - Raise open file descriptor limitsAt least on Ubuntu, the open files limit for normal users defaults to 4096. So, if you wan to test with more than ~4000 simultaneous connections you need to bump this up. The easiest way is to add a line to
/etc/security/limits.conflike “* hard nofile 16384” and then runulimit -n 16384before running your tests.
The Bad
It can’t be all good, right? Right. Actually, most of the problems I had with gevent could be solved with better, more thorough documentation, which leads me to:
Documentation
Simply put: it’s not good. I probably read more gevent source code than I did gevent documentation (and it was more useful!). The best documentation is actually in the examples directory in the source tree. If you have a question, look there first — seriously. I also spent more time googling through mailing list archives than I like to.
Incompatibilities
I’m specifically talking about eventlet here. In retrospect, this makes sense, but it can lead to some baffling failures. We had some MongoDB client code that was using eventlet. It simply didn’t work from the server process I was working on using gevent.
Order matters. Ugh.
Daemonize before you import gevent or at least before you call monkey.patch_all(). I didn’t look into this deeply, but what I gathered from a mailing list post or two is that gevent modifies a socket in python internals. When you daemonize, all open file descriptors are closed, so in children, the socket will be recreated in its unmodified form, which of course doesn’t work right with gevent. gevent should handle this type of thing or at least provide a daemonize function that is compatible.
Monkey patching. Sometimes?
So, most operations are patched by executing monkey.patch_all(). I’m not a huge fan of doing this sort of thing, but it is nice that normal python modules continue to function. Bizarrely, though, not everything is patched. I spent a while trying to figure out why signals weren’t working until I found gevent.signal. If you’re going to patch some functions, why not patch them all?
The same applies to gevent.queue vs. standard python queue. Overall, it needs to be clearer (as in a simple list) when you need to use gevent specific API’s versus standard modules/classes/functions.
The Ugly
gevent has no built in support for multiprocessing. This is much more a deployment issue than anything else, but it does mean that to fully utilize multiple cores, you’re going to need to run multiple daemon processes on multiple ports. Then, most likely, you’re going to need to run something like nginx (at least if you’re serving HTTP requests) to distribute requests among the server processes.
Really, the lack of multiprocessing capability just means another abstraction layer on your server that you might have added anyway for availability.
It’s a bigger issue when using gevent for client load testing. I ended up implementing a multiprocess load client that used shared memory to aggregate and print statistics. It was a lot more work than it should have been. (If anyone’s doing something similar, ping me and I can send the shell of the client program.)
Last Words
If you’ve gotten this far, you noticed that I spent two full sections on negative aspects of gevent. Don’t let that fool you though. I’m convinced that gevent is a great solution for high performance python networking. There are problems, but mostly they’re problems with documentation, which will only improve with time.
We’re using gevent internally. In fact, our server is so efficient that we’ll easily run out of bandwidth resources before computing resources (both processors and memory) for the VPS size we’re using.

re: iptables connection tracking.
You can disable connection tracking over the loopback interface. Not sure why this isn’t the default to be honest, but I suppose it is a distribution shipping issue.
http://awesometrousers.posterous.com/conntrack-loopback-blues
re: multiprocessing.
Give gunicorn a look. It seems to provide multiprocessing (N instance behind a single bind with fork/accept mechanism) on top of gevent, with some handy daemonize functionality too.
http://gunicorn.org/
cactus
29 Oct 10 at 11:32 am
For multiprocessing, like say cactus you can have a look on gunicorn (http://gunicorn.org) and the gevent workers. There are 3 gevents workers, one using our parser, and the other 2 using gevent.pywsgi or gevent.wsgi.
Also gevent-websocket allows you to handle websockets using gevent with gunicorn: http://www.gelens.org/code/gevent-websocket/ .
Feel free to contact me if you have any questions
benoitc
29 Oct 10 at 11:58 am
The following line should fix your issue with daemonizing
gevent.reinit()
Andrew
29 Oct 10 at 9:22 pm
[...] gevent: the Good, the Bad, the Ugly at Mixpanel Engineering [...]
Legitimate Work At Home Employment Opportunities » Charlotte Web Design – Ugly Doesn’t Sell
29 Oct 10 at 10:02 pm
greenlet is slow. The way it swaps out the functions, and yields is done inefficiently and can be done with less work.
But have you considered using tornado as an alternative to coroutines?
Also don’t forget to reduce your recv/send buffer for testing because you’ll run out of buffer space/tcb later and this might go as a silent error that would show up as a performance degradation in httperf or whatever tool you are using .
Hasan Alayli
30 Oct 10 at 12:19 am
Shouldn’t the disabling of syn cookies be
echo 0 | sudo tee /proc/sys/net/ipv4/tcp_syncookies
It seems you are enabling them in the example.
Chris Hoffman
4 Nov 10 at 11:41 am
Actually piece of content. I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore I hope you review ever again rapidly.
Ila Makey
22 Nov 10 at 9:53 pm
I feel the need to tell you that you are hereby awarded two thumbs up for your efforts.
Robb
4 Dec 10 at 8:00 am
[...] 这是一篇翻译的文章,原文见http://code.mixpanel.com/gevent-the-good-the-bad-the-ugly/。 [...]
gevent: the Good, the Bad, the Ugly | In the Milky way
17 Mar 11 at 5:03 am
gunicorn is awesome for fast clients, but suffers from unicorn’s Achilles’ heel. Choices: greenlets, an upstream proxy or potentially port Rainbows! to gunicorn.
Barry Allard
30 Apr 11 at 6:05 am
[...] The other popular one is gevent. Avery wrote about some of the tradeoffs earlier: http://code.mixpanel.com/2010/10/29/gevent-the-good-the-bad-the-ugly/ If you're interested in what we work on, please apply – we're hiring: [...]
How and Why We Switched from Erlang to Python at Mixpanel Engineering
5 Aug 11 at 5:37 pm
With the release of gevent 1.0alpha a number of long-standing issues were fixed, including at least a couple from your list:
- regular signal.signal() function now works with gevent, even without monkey patching.
- dns resolver no longer breaks after fork(). In fact, we use c-ares instead of libevent-dns, which is much better.
Denis
6 Aug 11 at 9:04 pm
I just stumbled upon your blog and then wanted to mention that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and therefore
denf
5 Sep 11 at 9:22 am
Maybe it’s a mistype
[orig]
Increase the client port rangeecho -e ’1024t65535′
[hmm]
Increase the client port range $echo -e ’1024\t65535′
Niemi
5 Oct 11 at 9:58 pm