The Hidden Costs of Chasing the Mythical “Five Nines”




Abstract

If you do not have an Error Budget, chances are you are wasting resources chasing a mythical SLA that has no defined business driver.

Site Reliability Engineering practices and tooling can reclaim much of these costs and resources, but only after there is organizational awareness that they exist.

Description

“Five Nines” refers to the five nines in 99.999% available that is often synonymous with highly available.

Does every highly available service require five nines? Not by a long shot.

Yet the general state of the practice is to chase after this typically unrealistic goal almost blindly in many cases, often leading to unnecessarily high costs in both operational and development resources.

Even less aggressive availability goals are often over-specified compared to true business drivers.

This talk will cover:

  • the history of “five nines” : common reasons why many organizations often inadvertently over-specify availability requirements
  • the costs of such over-specification : how service agility is negatively affected
  • examples of highly available systems with reasonable availability requirements : techniques on how to avoid over-specification based on Site Reliability Engineering principles
  • ways to spend your Error Budget (once you have one) most effectively- what a Performance Budget is and how you can use it to lower operational costs

Applying these techniques should result in a more cost-effective service that keeps end users and management happy, and fewer alerts to the on-call DevOps engineer.

Speaker

steve-fox

Steve Fox

  

Steve is the founder and CEO of AutoScalr.

He has over 25 years of software technology experience in various roles including Development, Architect, Systems Engineering, Sales, Product Management,

...