Wednesday, April 27, 2011

Amazon EC2 downtime a manufactured storm?

So Amazon EC2 on the US East Coast went down for a day.  Big deal.

Many IT folks are smugly extolling the virtues of their own co-located setups, and berating those naive, risk-happy organisations for ever attempting to use a public cloud in the first place.  The reality, though, is that similar events will have happened in their own DC, but when you're busy fixing the issue, you're not so exposed to the pain of the service being down.  For the user, the situation's identical - the site's still down - even if there's more of an illusion of control for the techies involved.  And, to those individuals: can you absolutely guarantee that you have the resources and expertise to be able to fix any serious breakage that may arise?  Do you have spare servers sat on the shelf, ready to swap in and rapidly provision?  Know every element of your configuration intimately?  And, even if you can answer "yes" to all these questions, will you be really be able to respond as rapidly and comprehensively as the mini-army that cloud economies of scale allow a provider to employ?

For ultimate 'head in sand' thinking, you could fork out for a Tier 1, enterprise-level hosting provider on the basis that by doing so, this sort of thing could never happen.  But, from personal experience, it does.  Of course, using said provider's network of global datacenters, together with good architectual practices, will provide a very resilient infrastructure; equally, if you're working for an organisation of this size, then you wouldn't have all your systems concentrated in a single region if you were using EC2 anyhow, never mind a single availability zone.

This isn't to say that Amazon have covered themselves in glory during this particular episode: communications have been wholly inadequate throughout, and there are questions to be answered in terms of how the outage was able to spread to supposedly separate availability zones.  Those using the cloud have lessons to learn too.  Deploying to a single availability zone is just dumb; so too is the unnecessary use of EBS-based instances, which are fundamentally less scalable and more performance-sensitive than their instance-store cousins.

But in many ways the speed and vigour with which supporters of traditional hosting models have flocked to deride Amazon for the outage is the interesting story here.  You'd doubt that a similar outage for even a cluster of enterprise-level hosting providers would cause such a stir.  Cloud is looking very enticing to many businesses fed up with the inflexibility and expense of their IT infrastructure, AWS is being taken extremely seriously, and those invested in traditional infrastructure are feeling the heat.

Monday, April 25, 2011

Automate EBS snapshots with Ebs2s3 on Amazon EC2

I've spent a while looking for a simple way to schedule regular snapshots of EBS volumes for Amazon EC2-based production environments. The snapshot feature is really useful: it allows you to take regular delta backups of your EBS volumes and save them it S3, which can then be used to rebuild the volume in the event of a system failure.  The solutions I've come across, though, seem to mostly consist of running scripts via cron, which isn't intuitive or easy to manage.  So I've put together a little rails app to do the job: Ebs2s3

Ebs2s3 allows you to flexibly schedule backups of your volumes using standard cron syntax, and to specify the number of backups to retain.  Some screenshots:

The login page

Displaying current jobs 

Showing an individual job

More details (including installation instructions) can be found on the project page, here