AWS Outage: Lessons to Learn

Several AWS services experienced errors on Wednesday, resulting in major problems for some businesses. Now resolved, what can we learn from this exercise?

At 9:52am on Wednesday, November 25th, Amazon Web Services Health Dashboard indicated that there was a problem with the Kinesis Data Streams API. Anyone in the US-EAST-1 Region was unable to read or write data published to Kinesis streams. This posed a major issue for some of the impacted businesses. According to The Verge, “It seems the issue is fairly widespread, as a number of apps and services have posted on Twitter about how the AWS outage is affecting them, including Roku, Pocket, Flickr, Adobe Spark, Spotify-owned Anchor, Glassdoor, Getaround, and iRobot. The Philadelphia Inquirer, Tampa Bay Times, and Capital Gazette have also said that they are having issues with publishing stories due to the outage.”

As much as security has been in the news lately, sometimes we forget that other things can go wrong. This is one of those things. AWS is the foundation of many websites and apps, and having it go down caused a plethora of problems. Someone renting a car couldn’t get the car to start because AWS was down. Users could not log in or create new accounts on Flickr. Media outlets unable to post stories. Adobe Spark users could not access or edit projects. No sector of business in the region seemed untouched by the outage.

This event gave us an incredibly small taste of what could happen if AWS went down at all. The ripple effects in this one region in the United States saw many businesses unable to operate at 100%. What would happen if more regions were impacted? What about the entire country? Worldwide? We’re already enduring what feels like a never-ending pandemic, imagine what would happen if any one of the major cloud providers suddenly malfunctioned. It would result in chaos.

Fortunately, the issue of increased error rates is fixable, and the likelihood of any major cloud provider completely breaking is infinitesimally small. But this example still illustrates the need for regular reviews of code. Recently, everything has been about security protocols. But we cannot forget to ensure that our systems are still functioning properly under the hood. Any sign of fragility should be fixed ASAP. Make sure all of your code is structurally sound, functional and secure.

The other lesson to take from this is that having proper fail over or backup plans for critical systems in place is really important. It’s critical for all vital business operations to have proper backup plans that are well documented for employees to follow. We’ve said before that businesses ran before the internet and they can do it now when the need arises. If you have the right processes in place for emergency situations or for when there’s an outage, then it’s likely you won’t see a huge dip in business operations. But if you have to scramble to adjust then you’re losing business for however long it takes you to put something in place.

A key thing to remember when it comes to internal code and technology reviews: You can review stability AND security at the same time. It’s not a two-step process, this is something that can be done all in the same sweep. The fix for each may need to be done separately, but there’s no reason to review the code, systems and processes twice. 

As always, when in doubt, hire an expert. If you don’t have the in-house manpower to conduct a review like this, bring in the extra help. The cost of bringing in an outsider will be far less than the cost of something breaking because of say an outage. The biggest benefit will be your ability to sleep comfortably at night knowing your business is stable and secure even if the rest of the world misses a beat. Don’t put your and your employee’s livelihood at risk, make sure your business stays in business everyday!

About the Author

Pieter VanIperen, Managing Partner of PWV Consultants, leads a boutique group of industry leaders and influencers from the digital tech, security and design industries that acts as trusted technical partners for many Fortune 500 companies, high-visibility startups, universities, defense agencies, and NGOs. He is a 20-year software engineering veteran, who founded or co-founder several companies. He acts as a trusted advisor and mentor to numerous early stage startups, and has held the titles of software and software security executive, consultant and professor. His expert consulting and advisory work spans several industries in finance, media, medical tech, and defense contracting. Has also authored the highly influential precursor HAZL (jADE) programming language.

Contact us

Contact Us About Anything

Need Project Savers, Tech Debt Wranglers, Bleeding Edge Pushers?

Please drop us a note let us know how we can help. If you need help in a crunch make sure to mark your note as Urgent. If we can't help you solve your tech problem, we will help you find someone who can.

1350 Avenue of the Americas, New York City, NY