Thoughts on bandwidth and Amazon S3
Tuesday February 19, 2008 - 6 months ago
Posted by James Ellis / Filed under Code, Software, Web
This past September we relaunched the Dangerbird Records website. Since then the site has experienced a massive increase in traffic, with the new videos section being particularly popular. In addition to regular web traffic, we’re noticing a significant number of users embedding Dangerbird videos on MySpace/blogs/etc. As a result, site bandwidth has been growing exponentially — a good problem, but a problem nonetheless.
Here’s an example video:
That’s a 27MB video file. If ten thousand people check it out, that’s roughly 270GB of bandwidth.
Here’s another example:
At 17 minutes in length, this video is a whopping 124MB. If 10k visitors consume it, we’re way up to 1.2TB of bandwidth. With high-performance bandwidth costing about $1/GB, you can see how costs can quickly get out of control for a popular website.
Fortunately, we can move this high-demand content elsewhere. The services meeting this type of demand are generally referred to as content delivery networks.
Traditionally, CDN’s were designed for performance and marketed to the enterprise crowd. For example, Apple uses Akamai to serve up images and videos for apple.com. Akamai hosts copies of Apple’s assets on high-performance servers all around the globe. When a user requests a file, they receive the file from whatever server is closest to their geographic location. Akamai helps Apple maintain global performance at web-scale demand, but it’s not exactly cheap.
There haven’t always been a lot of options in economy content delivery. (You might try to exploit Dreamhost’s ridiculous 5TB/month @ $6/month plan, but Dreamhost isn’t exactly performance hosting.) However, in the last few years we’ve seen a lot of activity in the economy content delivery space as web sites/hosts have struggled to keep pace with increasing demand for content such as videos/images. Surprisingly, what appears to be the best offering in this market comes from Amazon — best known as the world’s largest online retailer, not web infrastructure provider.
In March 2006, Amazon launched S3, or Simple Storage Service. S3 is a web service providing websites with unlimited storage and unlimited bandwidth. You simply pay for you what you use and at minimal cost (Storage: $0.15/GB/month, Bandwidth/transfer: $0.10/GB).
S3 is both an online storage service and content delivery network economy hosting/bandwidth provider. You could use S3 to backup your entire computer, or you could use S3 to deliver fifty-eleven-gazillion copies of a single animated GIF. Both tasks fall outside the scope of capability of a normal web host. Meaning, you can generally only host so much data, and any single server will eventually choke upon receiving too many concurrent requests.
Update: Our colleague Larry Ludwig of Empowering Media & HostCube – our primary hosting/IT provider – emailed to comment that Amazon isn’t currently set up as a proper content delivery network (See Wikipedia’s definition here), as S3 content is delivered from one of two locations – either out of Amazon’s D.C. data center, or from Europe – rather than being replicated across various nodes and serving users by geographic location. It’s important to make this distinction between CDN’s and S3’s economy storage/bandwidth/hosting.
A few example use-cases for you:
- An individual might use S3 to maintain a private, off-site backup of important documents, using minimal storage, and near-zero bandwidth, costing pennies per month.
- Dangerbird Records can use S3 to deliver an infinite amount of video content at minimal cost.
- A gigantic site like SmugMug (a photo site similar to Flickr) could use S3 to store a staggering amount user image data (in fact, SmugMug does use S3, saving them roughly $1M/year, crazy!)
It’s all a very interesting application of the distributed, on-demand, grid/cloud-computing, redundant, failure-tolerant, scalable (and many other words) systems architecture that we’ve arrived at in the post-Google world. Amazon sorted out the fundamentals of S3 in developing their own infrastructure for amazon.com. Now in true software-culture form, they have opened up their otherwise proprietary infrastructure to the world at minimal cost.
Since moving all video content to S3, we’ve seen Dangerbird’s normal bandwidth stabilize. Also, requests and transfer of video content will no longer being tying up server resources; now the server can focus on rendering page requests. And Dangerbird will save money.
Getting started with S3
If you can use FTP software, you can handle S3. It’s quite simple. First, set up an account. Then connect to S3 using some sort of client. The latest version of Transmit now provides support for S3, and there’s even a Firefox plug-in interface. Also, you might want to look into JungleDisk if you’re interested in off-site backup.
