Service Levels and Error Budgets

This talk summaries the SRE way of managing services, by which we define the goodness of a service as needed by the business and then use the bad time we can have to perform changes and increase the velocity of the service development. The talk will cover concepts from the Service Level Indicators (SLI) definition and how to make them universally accessible and useful with graphs and alerts to how to define how good a service should be. It also will try to answer the question of what to do with the bad time (unavailability) we could have while meeting our reliability targets.



Ramón Medrano Llamas

Ramón Medrano Llamas is a site reliability engineering manager at Google, focused on the Identity and User teams. He concentrates on the reliability aspects of new Google products and new features of ...