What Comes Next After Another AWS Disruption

A December outage that affected mission-critical applications raises questions about the risks of overreliance on a handful of cloud providers. […]

Category: Featured Article

A December outage that affected mission-critical applications raises questions about the risks of overreliance on a handful of cloud providers.

On December 7, 2021, which should have been AWS Innovation Day at re:Invent 2021, Amazon Web Services instead was contending with yet another regional outage that affected vast segments of the internet. Analysts with Forrester and Gartner say while the issue was significant, it was not a reason, nor realistic, to backslide on cloud migration.

According to updates from AWS, the cause of the outage was resolved for the most part after some seven hours. Recovery of services continued after that. Beyond questions about how it happened, concerns turn to what systemic breakdowns in the cloud of this scale mean in a world dominated by a small group of hyperscalers.

AWS indicated the latest outage stemmed from “an impairment of several network devices” that affected the company’s Northern Virginia, US-East-1 Region. The outage struck EC2, DynamoDB, Athena, and Chime, as well as other AWS APIs and services. This caused issues and downtime for third parties such as Disney Plus and Netflix. It also affected Amazon’s own resources, such as its package delivery management software and the Alexa virtual assistant.

If this seems a bit like déjà vu, it should. About one year ago, in late November 2020, the US-East-1 Region of AWS saw an outage that the company attributed to issues, as more capacity was added to its front-end servers for its Kinesis data stream.

While the frequency of such cloud outages has not necessarily increased, the overall impact increases, says Sid Nag, vice president of cloud services and technologies research for Gartner. “This was one of the largest since AWS started conducting business.”

Mission-Critical Apps More Susceptible

Back when organizations mostly ran non-mission critical applications on the cloud, outages could be taken in stride more readily. The migration to the cloud has meant more mission-critical apps are susceptible to such disruptions, Nag says.

“The cloud is a multitenant model,” he says. “Many different organizations were affected, not just IT services.” For example, the latest outage also cut off customers of Amazon Prime Video and Ring home monitoring service. “We’re seeing a bigger impact because of reliance on the cloud,” Nag says.

Consolidation of the cloud landscape has put the responsibility of maintaining this resource on the shoulders of a shrinking set of providers. That concentration may be a point of concern. “When they get impacted’ it’s almost like ‘too big to fail,’” Nag says. “That kind of thing worries me.”

Mission-Critical Apps More Susceptible

Case Study: How AI Improved My Service Desk

My Journey into AI on the Service Desk

8 Ways to Advance Your IT Support Career

How to Fast-Track Your IT Career

ABOUT HDAA

QUICK LINKS

Need Help?

What Comes Next After Another AWS Disruption

Mission-Critical Apps More Susceptible

Recent Posts

Case Study: How AI Improved My Service Desk

My Journey into AI on the Service Desk

8 Ways to Advance Your IT Support Career

How to Fast-Track Your IT Career

Quick Links

ABOUT HDAA

QUICK LINKS

Need Help?

Refund Reason