So Amazon EC2 on the US East Coast went down for a day. Big deal.
Many IT folks are smugly extolling the virtues of their own co-located setups, and berating those naive, risk-happy organisations for ever attempting to use a public cloud in the first place. The reality, though, is that similar events will have happened in their own DC, but when you're busy fixing the issue, you're not so exposed to the pain of the service being down. For the user, the situation's identical - the site's still down - even if there's more of an illusion of control for the techies involved. And, to those individuals: can you absolutely guarantee that you have the resources and expertise to be able to fix any serious breakage that may arise? Do you have spare servers sat on the shelf, ready to swap in and rapidly provision? Know every element of your configuration intimately? And, even if you can answer "yes" to all these questions, will you be really be able to respond as rapidly and comprehensively as the mini-army that cloud economies of scale allow a provider to employ?
For ultimate 'head in sand' thinking, you could fork out for a Tier 1, enterprise-level hosting provider on the basis that by doing so, this sort of thing could never happen. But, from personal experience, it does. Of course, using said provider's network of global datacenters, together with good architectual practices, will provide a very resilient infrastructure; equally, if you're working for an organisation of this size, then you wouldn't have all your systems concentrated in a single region if you were using EC2 anyhow, never mind a single availability zone.
This isn't to say that Amazon have covered themselves in glory during this particular episode: communications have been wholly inadequate throughout, and there are questions to be answered in terms of how the outage was able to spread to supposedly separate availability zones. Those using the cloud have lessons to learn too. Deploying to a single availability zone is just dumb; so too is the unnecessary use of EBS-based instances, which are fundamentally less scalable and more performance-sensitive than their instance-store cousins.
But in many ways the speed and vigour with which supporters of traditional hosting models have flocked to deride Amazon for the outage is the interesting story here. You'd doubt that a similar outage for even a cluster of enterprise-level hosting providers would cause such a stir. Cloud is looking very enticing to many businesses fed up with the inflexibility and expense of their IT infrastructure, AWS is being taken extremely seriously, and those invested in traditional infrastructure are feeling the heat.