Archive for the ‘Uncategorized’ Category
How to do cheap backups
This post is a follow up to Why we moved off the cloud.
As a company, we want to do reliable backups on the cheap. By “cheap” I mean in terms of cost and, more importantly, in terms of developer’s time and attention. In this article, I’ll discuss how we’ve been able to accomplish this and the factors that we consider important.
Backups are an insurance policy. Like conventional insurance policies (e.g. renter’s), you want piece of mind that your stuff is covered if disaster strikes, while paying the best price you can from the available options.
Backups are similar. Both your team and your customers can rest a bit more easily knowing that you have your data elsewhere in case of unforeseen events. But on the flip side, backups cost money and time that could be better applied to improving your product — delivering more features, making it faster, etc. This is good motivation for keeping the cost low while still being reliable.
Our backup machine has 2 quad-core CPUs, 12GB RAM, and 24 2TB drives in a hardware RAID 6 configuration. This is clearly not a cookie-cutter configuration — one of the benefits of dedicated hosting. The main draw is the ample disk space, but with the CPU and RAM provided you can still get real work done as well.
Of course, we follow all the usual best practices — RAID, replicated topologies (master-master, master-slave, etc), input logs, etc. Thus, we already have multiple copies of the important bits. But backups help round this picture out, and we want to do it right. Price is a concern, but speed and convenience are important to us as well.
How cheap is our storage, you ask? Well, the following is a rough back-of-the-napkin comparison of our current costs relative to some alternatives:
Per GB-month:
- Tarsnap (a system on top of S3) – $0.30
- Amazon S3 - $0.11 (1-49TB)
- Our Softlayer machine – $0.025
- WD RE4 (priced on Newegg), 2 yr lifespan – $0.0094
These numbers are only guideposts. For example, drives can obviously last longer than 2 years, but colo has its own costs and challenges as well. Also, the additional cost of Tarsnap over S3 might be justified by the convenience and compression offered. The point, though, is that Softlayer is giving us a very competitive price. Cheaper than Amazon by 5x is pretty impressive given that Amazon has massive economies of scale. Whether Amazon takes that margin as overhead or profit is up for speculation.
Keeping prices low is a goal, but the real resource of interest here is developer’s time and attention. It takes time to develop and maintain backup systems, to ensure that they are working properly, to manage space, and for engineers to make context switches between backups and their primary work responsibilities. Also, the goal is to keep our engineers happy and motivated so that their 120% effort goes towards taking the company to the next level. Backups are important, but don’t drag it out — get it done and get back to the really cool stuff.
Thus, the cost of backup storage is simply not our dominant monthly cost, and a developer’s time is worth a handy multiple. Here are some other factors to consider:
Bandwidth Cost
- Tarsnap – $0.30 per GB
- Amazon S3 – free inbound, pay for outbound and requests
- Softlayer – free between their data centers
Not having to worry about bandwidth gives us great flexibility. We can backup daily, weekly, monthly etc. without too much worry.
Bandwidth Speed
We have our main servers in the Dallas data center and the backup server in San Jose. Softlayer provides a gigabit connection between them. If you’re not used to this kind of speed across a good stretch of the continent, you might be pleasantly startled at first.
You’d also be hard pressed to even come close to these transfer speeds over more vanilla topologies. Even though Amazon has its Sneakernet-style data import/export service, do you really want to spend time mailing hard disks around?
Speed has manifold benefits. Backups share resources with the production site, so the sooner it’s done the better. Also, speed simplifies. It takes engineering time to make fine-grained decisions on full vs incremental backups, tweak locking and transactions, serialize backup order to take advantage of limited bandwidth, etc. With a fast connection, it’s much easier to do “dumb dumps” initially and refine later in a prioritized way.
Even if we save a few hours a month with these choices, that’s a huge win in the startup world. Getting more time for feature work helps our ability to grow fast and lead instead of playing catch-up.
Room for growth
One disadvantage of having a dedicated machine is that you’re paying for a fixed amount of resource up front. Right now, we pay for more capacity than we’ll need even for the next few months. This is generally touted as one advantage of cloud computing — pay only for what you use, and provision up easily.
In our case, it’s not a big issue. The amount we’d save over, say, the next half year isn’t worth the inconvenience of migrating data, provisioning and decom’ing machines, etc. Instead, we have runway as our data footprint continues to grow, and we can spend our time elsewhere. Like speed, space simplifies as well.
Simplicity & Flexibility
At the end of the day, a machine is a machine. There’s no new API to learn, access accounts to set up, API keys to provision, or client libs to integrate. Just deploy some SSH keys, drop a cron that kicks off some scp’s, rsync’s or mysqldump’s, gzip away and you’re done. It’s nice to use the toolchain we’re already familiar with, and to keep it as simple and uniform as possible across our whole infrastructure. Whether that simplicity comes in the form of uniform server creation scripts, monitoring tools, or even payment accounts, simpler is better.
Conclusion
Part of the wins we’ve achieved have to do with dedicated hosting, and we’re happy so far. Remember to shop around and get the best prices you can. But more importantly, by making certain decisions you can vastly simplify an important part of your infrastructure, keep costs down, and free up engineering time to work on the good stuff.
Internship stories
Last year, I wrote about my internship story because I felt it was such an impactful experience for me. It was simply a story of how working hard and being out in Silicon Valley can lead to very serendipitous occurrences. I don’t think I could have built Mixpanel without the knowledge and connections I gained at Slide. I learned so much about product, how to “get things done” at a real company, and met really close friends that I will take with me forever in life. I was also fortunate enough to work closely with Max, who has been an invaluable mentor and investor for our business.
The point of that post, of course, was to find ourselves interns. We wanted to get a lot of work done, but we also genuinely wanted to give them an extremely meaningful experience like my own. We’d publicly promised them one, so we set out to make good on it. At the end of the summer I asked them to write about what it was like to intern at Mixpanel. I hope those of you that are considering interning at a startup vs. a big company will benefit.
Why We Moved Off The Cloud
This post is a follow up to We’re moving. Goodbye Rackspace.
Cloud computing is often positioned as a solution to scalability problems. In fact, it seems like almost every day I read a blog post about a company moving infrastructure to the cloud. At Mixpanel, we did the opposite. I’m writing this post to explain why and maybe even encourage some other startups to consider the alternative.
How and Why We Switched from Erlang to Python
A core component of Mixpanel is the server that sits at http://api.mixpanel.com. This server is the entry point for all data that comes into the system – it’s hit every time an event is sent from a browser, phone, or backend server. Since it handles traffic from all of our customers’ customers, it must manage thousands of requests per second, reliably. It implements an interface we’ve spec’d out here, and essentially decodes the requests, cleans them up, and then puts them on a queue for further processing.
Because of these performance requirements, we originally wrote the server in Erlang (with MochiWeb) two years ago. After two years of iteration, the code has become difficult to maintain. No one on our team is an Erlang expert, and we have had trouble debugging downtime and performance problems. So, we decided to rewrite it in Python, the de-facto language at Mixpanel.
Given how crucial this service is to our product, you can imagine my surprise when I found out that this would be my first project as an intern on the backend team. I really enjoy working on scaling problems, and the cool thing about a startup like Mixpanel is that I got to dive into one immediately. Our backend architecture is modular, so as long my service implemented the specification, I didn’t have to worry about ramping up on other Mixpanel infrastructure.
We’re moving. Goodbye Rackspace.
At Mixpanel, where our hardware is and the platform we use to help us scale has become increasingly important. Unfortunately (or fortunately) our data processing doesn’t always scale linearly. When we get a brand new customer sometimes we have to scale by a step function; this has been a problem in the past but we’ve gotten better at this.
So what’s the short of it? We’re unhappy with the Rackspace Cloud and love what we’re seeing at Amazon.
Over the history we’ve used quite a few “cloud” offerings. First was Slicehost back when everything was on a single 256MB instance (yeah, that didn’t scale). Second was Linode because it was cheaper (money mattered to me at that point). Lastly, we moved over to the Rackspace Cloud because they cut a deal with YCombinator (one of the many benefits of being part of YC). Even with all the lock in we have with Rackspace (we have 50+ boxes and hiring if you want to help us move them!), it’s really not about the money but about the features and the product offering, here’s why we’re moving:
Building C extensions in Python
At Mixpanel performance is particularly important to us and as we begin to scale our data volume to support billions of actions. We’ve found ourselves thinking about how to solve problems better.
We’re currently writing a feature that is going require considerable scale and performance but in order to do it we had to think about how to do it in a time for our users to be happy. Unfortunately, Python is too slow for some types of operations we wish to do where we can get an order of a magnitude of performance out of something lower level like C.
So imagine: You want to stick to Python because it’s so fast to develop in but need the performance of C/C++. Let me introduce you to C extensions in Python.
If you’ve ever used something like cJSON in the past, then you’ve already installed something like this before–it’s likely a lot modules you import in Python are built in C and not just pure-python.