An AWS outage at the end of September caused users to experience problems in US-EAST-1. While now resolved, this once again shows why backups matter.
In the era of digital transformation, cloud migration is all the rage. Which is a good thing, of course, the cloud is secure, cost-effective and it provides many benefits to businesses. Each provider offers its own version of a service, some with different functionality than others, but the big three (AWS, Azure, Google Cloud) are pretty comparable for the most part. While the cloud provides these benefits to businesses, it’s also incredibly important to have backups for mission-critical systems and processes that are analogue or at least not connected to the internet. Why? Because cloud providers are not infallible. At the end of September, AWS experienced an event in its US-EAST-1 region, disrupting EBS volumes, causing EC2 instances and some sites to be unusable.
It all started during the evening of September 26. The AWS status page indicated that the platform was experiencing performance problems in its main availability zone. “Existing EC2 instances within the affected availability zone that use EBS volumes may also experience impairment due to stuck IO to the attached EBS volume(s),” a notice said 30 minutes later.
“Newly launched EC2 instances within the affected availability zone may fail to launch due to the degraded volume performance.”
The status page would continue to be updated over the next several hours with updates to what AWS was seeing and what they were doing to find the cause of the problem and deploy mitigations. In the wee hours of the morning on September 27, AWS announced the problem had been resolved and everything was fully functional. At that time, there were still businesses which were experiencing problems (like Signal), but the deployed mitigation did its job and AWS is back up and running.
This event, while problematic, largely happened while most of the US was asleep. ZDNet has a good breakdown of the timeline. However, it still illustrates the point that every business should have backups for mission critical systems that do not require the internet. Cloud outages are rare, but this isn’t the first one Amazon has experienced and it won’t be the last. Google Cloud and Microsoft Azure have their own problems, too, it just so happens that it was AWS this time. No cloud provider is exempt from an outage, rare or not, which is why analogue backups are so important.
If you recall, last November, AWS experienced an outage that affected a much larger portion of its users than this event did. We saw issues from major media conglomerates unable to publish news stories, Roku, Flickr and other companies also reported major problems with their websites. Both the new event and last year’s event happened in the same US-EAST-1 region. Thankfully, in both of these cases, the problem was remedied relatively quickly and everything returned to normal. But it’s important to remember that just because we haven’t experienced an extended outage in the past, doesn’t mean we won’t experience one in the future.
These events should be a reminder that technology is not infallible. Nothing created by humans will ever be perfect because we are an imperfect species. Even if we conduct regular code and security reviews and apply all the patches, there is no guarantee that something won’t break. Have a business continuity plan in place for when these out-of-the-norm events occur. Make sure backups and plans are not only reviewed annually to ensure no changes need to be made, but also that employees know what to do.
Business leaders have a lot on their plates in 2021. We’ve had a health pandemic that bolstered a cybersecurity epidemic already in the making. On top of those things, we have other events, normal or not, to figure out. And a labor shortage in all sectors and business industries. Those who are working are burning out, no matter their position, because they are stretched so thin trying to do multiple jobs. Don’t put an extra task on your team, bring in an expert to help ensure you have proper security measures in place, to do a code review and to ensure your business is set up for success and growth in 2022.
These outages are currently more of an annoyance than anything else, but that may not always be the case. Don’t get caught unprepared!