This was originally posted on High Scalability.
Building data products is not easy.
Many people are uncomfortable with numbers, and even more don’t really understand statistics. It’s very, very easy to overwhelm people with numbers, charts, and tables – and yet numbers are more important than ever. The trend toward running companies in a data-driven way is only growing…which means more programmers will be spending time building data products. These might be internal reporting tools (like the dashboards that your CEO will use to run the company) or, like Mixpanel, you might be building external-facing data analysis products for your customers.
Either way, the question is: how do you build usable interfaces to data that still give deep insights?
We’ve spent the last 6 years at Mixpanel working on this problem. In that time, we’ve come up with a few simple rules that apply to almost everyone:
- Help your users understand and trust the data they are looking at
- Strike the right balance between ease and power
- Support rapid iteration & quick feedback loops
(This is part one of a two part series, you can find part II here)
The Mixpanel reporting API is built around a custom expression language that customers (and our main reporting application) can use to slice and dice their data. The expression language is a simple tool that allows you to ask powerful and complex questions and quickly get the answers you need.
The actual Mixpanel expression engine is part of a complex, heavily optimized C program, but the core principles are simple. I’d like to build a model of how the expression engine works, in part to illustrate how simple those core principles are, and in part to use for exploring how some of the optimizations work.
This post will use a lot of Python to express common ideas about data and programs. Familiarity with Python should not be required to enjoy and learn from the text, but familiarity with a programming language that has string-keyed hash tables, maps, or dictionaries, or familiarity with the JSON data model will help a lot.
This post is a follow up to We’re moving. Goodbye Rackspace.
Cloud computing is often positioned as a solution to scalability problems. In fact, it seems like almost every day I read a blog post about a company moving infrastructure to the cloud. At Mixpanel, we did the opposite. I’m writing this post to explain why and maybe even encourage some other startups to consider the alternative.
At Mixpanel, where our hardware is and the platform we use to help us scale has become increasingly important. Unfortunately (or fortunately) our data processing doesn’t always scale linearly. When we get a brand new customer sometimes we have to scale by a step function; this has been a problem in the past but we’ve gotten better at this.
So what’s the short of it? We’re unhappy with the Rackspace Cloud and love what we’re seeing at Amazon.
Over the history we’ve used quite a few “cloud” offerings. First was Slicehost back when everything was on a single 256MB instance (yeah, that didn’t scale). Second was Linode because it was cheaper (money mattered to me at that point). Lastly, we moved over to the Rackspace Cloud because they cut a deal with YCombinator (one of the many benefits of being part of YC). Even with all the lock in we have with Rackspace (we have 50+ boxes and hiring if you want to help us move them!), it’s really not about the money but about the features and the product offering, here’s why we’re moving: