The top five risks to consider regarding cloud outages

Cloud outages are common, and no matter how much redundancy engineers build into cloud-based systems, they are here to stay.

Cloud outages are common, and no matter how much redundancy engineers build into cloud-based systems, they are here to stay.

Cloud outages are common, and no matter how much redundancy engineers build into cloud-based systems,they  are here to stay. A few recent examples of major cloud outages include:

  • Atlassian, which began on April 5, 2022, affected 400 customers, including Jira, the Jira Service Desk, Confluence and Opsgenie, and lasted two weeks until April 18, 2022
  • The AWS outage of December 7, 2021 impacted organizations, including Netflix, The Associated Press, Delta Air Lines and Toyota
  • Fastly on June 8, 2021 affected organizations, including Amazon, Reddit, The New York Times and the U.K. government website

When an outage occurs, they affect a company, its customers, and its customers’ customers. There are five primary risks to consider for the likely occurrence of an outage: revenue and expenses, reputation, legal, distraction to business, and service-level agreements (SLAs) with customers.

1. Revenue and Expenses

There are many examples of financial costs that a company experiencing an outage may incur. For example, they may need to engage a public relations firm to help manage reputational loss, reimburse customers for services they weren’t able to provide, or acquiesce to less favorable terms on another contract. Clicks on advertisements may go nowhere, resulting in wasted spending. In the event of an outage, companies such as e-commerce sites experience a direct loss of revenue—not only are there no sales, but customers are likely to leave to find a competitor.

These are just a few examples of how outages can affect revenue and expenses.

2. Reputation

In addition to the articles linked in the above examples of outages, there are many more articles about those outages. Media coverage and reputation loss naturally follow outages. Brand-trust metrics drop, and it’s difficult to rebuild trust after an outage. Social media erupts, prospective customers may stay away, while competitors and customers won’t let a company forget about an outage. It’s a long, expensive process to rebuild a reputation, and customers may leave.

3. Legal

A business impacted by an outage could be fined by FINRA, or by other government entities such as the FCC. FINRA has been paying extra attention to financial institutions’ systems’ availability and continuous service to customers. In some cases, outages have resulted in significant fines as a public company could be violating its fiduciary responsibility because of downtime. Customers can sue through a class-action lawsuit. The possibilities are endless for a lawsuit to cause further distractions to the business and expense.

4. Distraction to business

When an organization has an outage, most of its activities are disrupted. It’s necessary to determine which customers were affected, which weren’t, how to communicate with them, and how to make them whole. The customer success team will have to manage customer complaints, correspondence, and support tickets during and after the outage. It may be necessary to provide a post-mortem analysis and a plan to prevent an outage and its disruptions from happening again. An outage creates fear and doubt about service renewals.

An outage is like a disaster scenario, requiring everything to be put on hold while determining how to fix it. There could be loss of data. There could be transactions requiring reversal or confirmation they occurred. When a system goes down, it may require considerable effort to restore it, such as in the Atlassian outage, which took two weeks to restore by engineers working around the clock. Also, in Atlassian’s case, it was worse because the company couldn’t even access the records of the affected customers to know whom to contact and how to communicate with them because customer data was deleted.

5. SLAs with customers

With service-level agreements (SLAs), typically losses or indemnification of a customer’s losses are limited to the cost of the service. For example, if a cloud-based service goes down for 12 hours, the customer receives a 12-hour service credit for the downtime. But that compensation doesn’t equate to the business loss. Usually, the service provider only agrees to pay for loss of service, if any, not for business loss. From a vendor perspective, the vendor loses revenue because it must credit its customers. From a customer perspective, all they’re getting are service credits, but their real losses are much larger. Insurance for the vendor can help cover this gap for the customer, who will have greater confidence in the vendor because insurance will compensate the customer for the business loss.

How to prepare

There are a lot of risks associated with cloud outages. Each organization must evaluate the risks and prepare for the best possible outcome in the event of an outage. Some suggestions to consider:

First, communication and transparency with clients are key, as brand reputation damage is often what hurts a business most. Have a strong communication plan to mitigate the negative effects of outages on your customer base.

Second, keep a customer-centric mindset. In some business cases, downtime can represent millions of dollars in losses. Keeping your customers’ needs and interests at the forefront of your mind helps better address and alleviate any pain points.

And finally, preemptively performing an in-depth risk analysis and understanding disaster scenarios that cover every aspect of the business can be of help. Companies can mitigate against these risks with downtime insurance.

Published on Fast Company.

The Fast Company Executive Board is a private, fee-based network of influential leaders, experts, executives, and entrepreneurs who share their insights with our audience.

Neta Rozy
Neta has a rich background developing enterprise software and robust monitoring systems. She co-founded Parametrix and built a team that is pioneering the development of a unique, global downtime event monitoring system to track SaaS, PaaS and IaaS system outages, network crashes, and platform failures down to the millisecond.
View Profile
September 15, 2022