Abstract:

As organizations experiment with greater concurrency and integration between their departments and move toward a continuous delivery of customer-value, failure is assured. Asking "how can failure be avoided?" isn't as useful or relevant as focusing on

  • "How does our organization react when failure occurs?" and
  • "How do we create a sustainable, actionable process for describing, exploring, and remedying failure?"

This is the question that presented itself to Salesforce’s Service Reliability Engineering team. Their SREs had received training in incident response and management, but were still struggling with how to incorporate that feedback into the organization at large, to improve outcomes. Feedback loops weren’t always closed, leaving many opportunities for improvement lost.

This is the story of my months-long journey with J.Paul Reed and my team to identify the specifics of what made reliability retrospectives difficult to have, why actionable takeaways were often lacking, and how the feedback loops within the company’s operations organization weren’t serving Salesforce’s needs.

We then ran a series of experiments together, putting the SRE team on a road to improving their ability to respond, react, remediate, and re-incorporate learnings from failure into the organization.

The Takeaways?

  • The importance of the retrospective process in building resilient, humane, operable systems.
  • The common hurdles to holding actionable operations retrospectives and the experiments we ran to overcome those hurdles at Salesforce, hopefully providing possible solutions for attendees.

Speakers:

Kevina Finn-Braun

Kevina Finn-Braun’s focus throughout her 18 years in the Internet Industry has been Operational Excellence and Risk Management. She is currently Director of Site Reliability Service Management at Salesforce where she leads the team focused on operational process improvements in the areas of incident, problem and change management. In her previous role as Director of Business Continuity at Yahoo! she led the team focused on risk management and service continuity best practices.

J. Paul Reed

J. Paul Reed, aka The Sober Build Engineer, has over a decade of experience in the trenches as a build/release and tools engineer, working with such organizations as VMware, Mozilla, Postbox, and Symantec. In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations “Simply Ship. Every time.” He’s worked across a number of industries, from financial services to cloud-based infrastructure, with teams from 2 to 12,000 on everything from tooling, operational analysis and improvement, team culture transformation, and business value optimization.

blog comments powered by Disqus

Past

Future