March 28, 2007
ETECH: "Set Amazon's Servers on Fire, Not Yours" raw notes
Scalability: Set Amazon's Servers on Fire, Not Yours
Don MacAskill, CEO & Chief Geek, SmugMug
95% of photos at SmugMug are on S3 - they are doubling every year. 140 Million photos right now. They love love love S3.
always on, global, infinante storage.
$0.15/GB/Month w/replicas
Easy REST API
Really fast. Not 15KSCSI fast, but really damn internet fast.
Definitely changes the whole game for startups.
using S3 lets you focus on the apps not the muck. Amazon is using the stuff too so if there is ever a problem they are on it in a second. They didn't want to deal with dataservers or any of that and this lets they get away from it.
They would have spent $922K in the last 12 months if they had used traditional hosting systems, with S3 they spent $230K. Also, saved $295K in taxes by not buying the disks (saved for cash flow).
S3 is perfect for startups and small companies. Great for "store lots, serve little" business of all sizes. Not so great (just yet) for serving lots if you are medium or large sized business. Transfer costs can be high. Smugmug buys in 1GBPS chunks. No sliding scales at Amazon just yet.
S3 was super easy to just drop in, they started on monday and had live production in place on Friday so really simple.
They started just doing secondary storage but that was too cold, tried as primary and that was too hot. Happy medium is that they put 100% of the data goes to S3, then about 10% is storred locally as well that is used more often. Resulted in 95% reduction is disks they needed to buy.
nearly 100% served are proxy reads, sometimes HTTP redirects but all URLs are nice clean smugmug.com links. rarly do they need direct S3 links.
they put their permissions model right on top of S3s permissions which *mostly* works.
reliability is *close* to 100%. Not quite, but close. More reliable than the stuff they do themselves. When there is failure it's not always amazons fault. Nothing else they use is 100% either, so always build with failure in mind. stuff breaks - just try again. If writes fail, write locally and sync later. reads fail? handel intelligently. Alerts!?
fast
- reads/writes XX Mbps
- mostly speed of light limited 20-80ms
- parallel i/o for massive throughput XXX Mbps
- all tests they did are machine measurable but human indistinguishable
S3 isn't a content delivery network, it's stoage.
no global locations (yet)
limited edge caching.
Amazon (might) be building something that does that.
store and forward gives great resiliency but poor performance
stream offers poor resiliency but great performance. Can do a quick HEAD first to verify.
in last year there have been 5 major issues that they know about. 3 lasted from 15-30 minutes, 2 were core switch failure (meaning amazon was down too)
2 performance degradations. One a customer noticed, second they didn't. Not a big deal, everything fails so expect it.
Service and pupport - this is one area where Amazon is weak. This is a utility, not a service. they need a service status dashboard, pro-active customer notifications, ability to get a hold of a human - as none of that is already available.
that said - amazon.com's customer service is pretty good, hopefully AWS will catch up soon.
Amazon has saved their butts. had power loss to 70TB of data and the fail over to amazon worked perfectly. moved datacenter during business hours and no one noticed.
Misc tips- use cURL, much faster. Make stuff as async as possible, this hides the speed-of-light problem.
They are about to start working with EC2. Will let them scale up or down via API, web servers, processing boxes, development test beds, build servers, etc. You name it.
Also adding SQS as well. $0.10/1000 items. Retrieves jobs with EC2 instances using S3 data.
Missing pieces-
would love to not have a datacenter at all. Database API missing is big problem. DB grade EC2 instances could work. Faster, persistent. Load balancer API. Single IP in front with lots of Ec2 instances. CDN.
Questions from audience - They pay $6-7K a month for transfer to amazon.
Someone asked if potential investors had a problem with so much critical data being outsources. Don said that quite the opposite, people they have talked to like that Amazon is involved as they know it's reliable and in fact VCs should be asking people if they are using this service or not. Their could be very good reasons not to use it but if a start up doesn't at least know about it and have looked into it that's a sign they haven't done their technical due diligence.
Technorati Tags: etech, etech2007, oreilly, sandiego
"S3 isn't a content delivery network, it's stoage" is pretty much a bullshit claim. JPG Magazine (which I'm tech lead on) uses S3 for storage AND serving. Every photo you see on http://jpgmag.com/ comes from S3 and it's blazingly fast and totally cheap. Love em! So yeah, not just for storage.
Posted by:
Jason DeFillippo on March 28, 2007 02:12 PM
According to Alexa:
jpgmag.com:
Speed: Average (59% of sites are faster), Avg Load Time: 2.6 Seconds
smugmug.com:
Speed: Very Fast (81% of sites are slower), Avg Load Time: 1.0 Seconds
Posted by:
The voice of reason on March 29, 2007 12:34 PM
Those numbers don't accurately measure the 2 sites. We have a lot of images on our homepage and smugmug only a few which is going to skew the results. Also if those are historic averages we were definitely slower which is why we moved our entire operation to a new faster provider. Either way I don't care. We haven't gotten a single speed complain in the entire time we've been running so :-P
Posted by:
Jason D- on March 29, 2007 01:40 PM
thanks for posting your notes. I didn't realize that smugmug doesn't have local storage for all their images. I may have to rework the storage costs in a business plan I'm working on.
Posted by:
eas on March 29, 2007 08:52 PM
Post A Comment