Building C extensions in Python

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

At Mixpanel performance is particularly important to us and as we begin to scale our data volume to support billions of actions. We’ve found ourselves thinking about how to solve problems better.

We’re currently writing a feature that is going require considerable scale and performance but in order to do it we had to think about how to do it in a time for our users to be happy. Unfortunately, Python is too slow for some types of operations we wish to do where we can get an order of a magnitude of performance out of something lower level like C.

So imagine: You want to stick to Python because it’s so fast to develop in but need the performance of C/C++. Let me introduce you to C extensions in Python.

If you’ve ever used something like cJSON in the past, then you’ve already installed something like this before–it’s likely a lot modules you import in Python are built in C and not just pure-python.

Here’s a quick tutorial on how to do it on Mac OS X (it’s probably a bit easier on linux if you install the python-devel package)

1. You need Python and Python development headers. On Mac OS X it’s likely they are already installed or you may need XCode. (You should be able to find them here: /System/Library/Frameworks/Python.framework/Versions/2.6/)

2. You should also have distutils module for Python (though I believe it comes installed): test via import distutils

3. You should also have g++ or gcc installed (which you may need Xcode installed)

4. Lets get to work by writing a quick C program that just takes a command that it calls to the system (call it spammodule.c):

5. Next write a Python program to install this module called setup.py:

Distutils is simply a way to distribute your Python modules. Running this along with distutils basically builds your extension, compiles it under gcc and creates an object file (.o file), runs gcc with dynamic_lookup and creates a shared object that gets copied into sites-packages where other modules are stored.

6. Almost done, now lets build it and install the module:

python setup.py build
python setup.py install

gcc should compile this file and then install to site-packages just like any other module.

7. Run python and try importing the module:

Congrats, you just wrote some C code and ran it in Python. There’s also other things like Protobufs and Thrift which allow you run code in a different language (cross-server too) but writing C extensions in Python may be a more interoperable and cleaner way to get a particular task done.

Great references:

http://docs.python.org/extending/extending.html
http://docs.python.org/distutils/

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

9 thoughts on “Building C extensions in Python

  1. Krishna Srinivasan

    Also try, module ‘psyco’. Two lines of code and you will get 2x to 100x (esp. if your program does lots of number crunching).

    For more, consider ‘pyrex’. It will take you almost as far as you would want to go with perf.

    Reply
  2. Wai Yip Tung

    I’ve doing this a little bit for the same rationale as yours. Keeping track of reference count is so tedious and error prone. Then I was experimenting with writing plain C and calling it with ctype. I like it much better than writing C module. The point of building it in C code is for optimization not as reusable component for other people. So there is little advantage to do all the hard work with C-API. Might as well build it quick in plain C.

    ctype performance benchmark – Tung Wai Yip’s blog
    http://tungwaiyip.info/blog/2009/07/16/ctype_performance_benchmark

    But I think the real game is to code in pyrex or cython, which I have yet to learn.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *