jphackworth 14 years ago

I'm a little surprised so many machines are used to run Instagram. TechCrunch mentioned their peak has been 50 photo uploads per second (which they say go directly to S3, so Instagram's servers only need to pass a token). Of course there are other forms of requests, but just back of the envelope it seems like it should not require anywhere near "hundreds" of machines.

Not to be too harsh - it's just three engineers, so it makes sense if the setup is still evolving.

  • rdouble 14 years ago

    I was surprised they had so few... I once worked on a site with 1/6th the users and 3.5 times the number of instances.

    They could do better but they'd have to manage their own datacenter and write portions of the app in C++. It's probably not worth it at this point unless they hire someone with that specific expertise.

    • jphackworth 14 years ago

      I once worked on a site with 1/6th the users and only one machine. ;-) Counting users often doesn't match across sites, especially when an Instagram user is someone who has downloaded the app, and might never come back. That's why the 50 photo uploads per second peak is a useful benchmark.

      • mikeyk 14 years ago

        Hey, author here. To clarify, our uploads go to our servers first (where we resize for thumbnails, etc) then go to S3.

        • palish 14 years ago

          Offtopic: Would you consider letting a hacker work for you remotely?

        • WALoeIII 14 years ago

          Do you use GraphicsMagick or ImageMagick? Shell out or python bindings?

          What settings do you use for: MAGICK_MEMORY_LIMIT MAGICK_MAP_LIMIT MAGICK_DISK_LIMIT

          Do you tune them dynamically or just set and forget?

          • scottostler 14 years ago

            I'm curious about this as well.

armandososa 14 years ago

I like posts like this a lot. I'm just a web designer, but I found scaling web sites fascinating, like some kind of dark art or secret craft.

Where do you learn this stuff? Do you need a CS Degree from stanford or something? I like the black magic aura, it's romantic, but I'd really like to understand how to scale websites doing stuff like the OP describes.

  • tkahn6 14 years ago

    I don't get the impression that one would need a degree to devise the scaling strategies they're employing. This would seem more the product of battle-hardened experience rather than a formal education.

  • jules 14 years ago

    While scaling web sites is fascinating, for most people it is also unnecessary. The vast majority of web sites runs just fine on 1 computer. For example, hacker news runs not just on one computer, but actually on 1 core. So with a single 8 core box with ~100GB ram you can get quite far and save yourself a big hassle.

d_r 14 years ago

I know that this is probably a recruiting-inspired post, but detailed posts like this genuinely benefit the community. Thanks for specifically mentioning the reasons for choosing particular technologies (i.e. why you switched to Gunicorn from mod_wsgi) -- this makes the already excellent post even more helpful for someone trying to build things.

latchkey 14 years ago

I guess my question is, how do they make money? I really like instagram images. I've used the site myself, but it certainly isn't something I'd feel the need to pay money for.

  • gallerytungsten 14 years ago

    Funding Total: $7.5M (per techcrunch)

    Server bill: $35k/month, $420k/year, per estimates in other comments.

    Personnel, overhead, other expenses: $1.5M/year (guess).

    Runway: 3.9 years to figure it out.

    • Ecio78 14 years ago

      dumb question: is there a way to pay Amazon for AWS fees except for credit card? 'cause i was wondering how can you create big infrastructures on Amazon if you cant pay by wire transfer or some other kind of link to a bank account (like phone and gas bills)

      • camwest 14 years ago

        American Express has no limit as long as you pay it back ASAP.

        • Ecio78 14 years ago

          i dont know if i'll have/use without fears a credit card with no limits... :)

        • hboon 14 years ago

          Generally yes, but a country's central bank may prohibit that. Singapore's for example does that.

  • ell 14 years ago

    It won't be difficult to make money if they don't try to be clever. They can display ad just like Twitpic on the their website. They can have storage limit.

latchkey 14 years ago

Those Quadruple Extra Large instances are $2/hr. The 24 of them used for postgres would be like $35k/month just for that part alone. I'm guessing they are spending >$100k/month on just hosting 100+ instances. Not to mention disk, bandwidth, dns, s3, public ip's, etc.

  • foobarbazetc 14 years ago

    Every time I see numbers like this, I wonder why everyone seems to think you have to use AWS or else you've failed at scaling.

    They could run their operation for 10-20% of their AWS costs at a dedicated server host. And everything would be much, much faster.

    • jaequery 14 years ago

      i use aws/cloud like i use spare tires. i only use them in emergencies. why? the price, i don't care much. it's just the performance gain going from aws network i/o to directly-attached SSD/SAS i/o is almost night and day

    • notJim 14 years ago

      I noticed this, too. This statement stood out to me in particular:

      > Our main shard cluster involves 12 Quadruple Extra-Large memory instances… We’ve found that Amazon’s network disk system (EBS) doesn’t support enough disk seeks per second, so having all of our working set in memory is extremely important.

    • ww520 14 years ago

      Using AWS is not just for its instances. S3 is a big factor. It's hard to replicate the S3 functionality in your own hosting without much more effort and cost. Granted that the AWS instances can be used more efficiently.

      • Ecio78 14 years ago

        cant you just upload to S3 from your own dedicated machines? or it adds too much delay to operations? Author posted that images are first loaded on their system, resized and so on and then loaded on s3, so at least for image upload it shouldnt be such a great problem.

        disclaimer: i have no smartphone and never used their app :)

        • ConstantineXVI 14 years ago

          Besides latency, you don't pay for internal data transfer within AWS services. If you did the image processing on your own machines, you'd be paying for bandwidth every operation; where if you do it in EC2, your only outbound transfer is viewing the images.

  • rkalla 14 years ago

    Instagram isn't paying on-demand prices, 3yr reserved is 48% cheaper than on-demand.

  • tptacek 14 years ago

    At ~35k/mo (they may have a deal here, though), that's the fully loaded headcount of 2-3 FTE devops people. In return, they get EC2's turnaround time on new instances. Not to mention that they're constantly pushing images to S3.

    I would agree that EC2 isn't a no-brainer decision here, but it seems like a reasonable one.

geuis 14 years ago

One thing about how Instagram's load balancing that I don't like is that they rate-limit their proxies on image requests. In my recent testing, its roughly 5-6 requests every 3 seconds or so. Any requests more frequent than that return 503 status codes. I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.

I can guess at some of the reasons, such as they didn't foresee a user loading more than a few images at once. Perhaps they perceive rate limiting as a protective measure.

However, I've done testing on Twitpic, imgur, and yfrog and haven't run into the same issues. Twitpic, for example, generates a lot more traffic than Instagram and they don't have the same rate-limiting.

  • ceejayoz 14 years ago

    > I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.

    S3 accesses cost money, so it makes sense that they'd rate limit access to them. A botnet hitting an S3 URL could incur large fees for the owner of the file very rapidly.

mkjones 14 years ago

Glad to see other people using vmtouch. It's also great for keeping large codebases in the filesystem cache on [shared] dev machines.

cagenut 14 years ago

With that big a monthly AWS bill, I could pretty easily justify my salary and the costs of building out a 4 - 10 rack colo setup. With room leftover for a dba consultant on retainer and a pro-serv budget for ad-hoc stuff.

sant0sk1 14 years ago

That's a lot of instances! It'd be interesting to run the numbers and get an idea of what their monthly AWS bill looks like.

  • clarkni5 14 years ago

    By my math, the bill for their app and database servers would be approaching $30,000 per month. That doesn't include storage costs, bandwidth, or any of the other aspects of their infrastructure.

    That's crazy, if you ask me.

    • simonw 14 years ago

      Is that calculation taking reserved instances in to account?

      • rkalla 14 years ago

        No, I don't think so. Latchkey did the same calculation, using on-demand prices and came up with $35k[1]

        3rd reserved is roughly 48% cheaper than on-demand, so real hosting cost would be around $18,200 for those servers.

        [1] http://news.ycombinator.com/item?id=3306394

mcginleyr1 14 years ago

For their load balances, why aren't they assigning elastic ip. Then they would have to wait for DNS just reassign the ip...

vidar 14 years ago

What was your take on Gunicorn over uWsgi?