Jason Hand


The Unrealized Role of Monitoring & Alerting

In today’s world, a company must be a “Learning Organization” in order to be successful and innovative. Learning from both failure and success, in order to implement small incremental improvements is critical. But until you implement and apply new information, you haven’t truly “learned” anything and you certainly haven’t improved.

According to the 2015 Monitoring Survey, most companies leverage metrics from monitoring and logging purely for performance analytics and trending. If high availability and reliability are important, they also leverage metrics to alert on fault and anomaly detection. Despite these “best practices”, the metrics are primarily only used as context to keep things “running” or return them back to “normal” if there’s a problem. Rarely is that data used as a method to identify areas of improvement once services have been restored. 

When an outage occurs to your system, you will absolutely repair and restore services as best you know how, but are you paying attention to the data from the recovery efforts? What were operators seeing during diagnosis and remediation? What were their actions? What was going on with everyone, including conversations? A step-by-step replay of exactly what took place during that outage.

This “old-view” perspective on the purpose of monitoring, logging, and alerting leaves the full value of metrics unrealized. It fails to address what’s important to the overall business objective and it lacks any hope of seeking out innovation or disruption of the status quo.

This talk will illustrate how to identify if your company is making the best use of metrics and ways to not only learn from failure, but to become a “Learning Company”.

The Benefit of a Systems Lens

Understanding feedback through a systems lens has advantages. This pure feedback loop is more accurate, it moves us away from needless judgement, and it enhances accountability just to name a few. By taking a step back and examining the feedback and data we receive in three different ways, we can understand it much clearer.

Those ways are:

  1. Are differences between the giver and receiver creating friction for the feedback?
  2. Is the feedback partly related to the differing roles between giver and receiver as it relates to the common “system”?
  3. Are processes, policies, physical environment, or other factors within the system reinforcing problems with the feedback?

Allowing ourselves to view feedback from a “Systems Thinking” model, we can begin to look for patterns, understand the feedback loop with more accuracy, and identify contributing factors to both failure and success.

This quick (IGNITE-style) talk will discuss feedback from a “Systems Thinking” perspective.

Speaker

Jason Hand

Jason Hand

@jasonhand

DevOps Evangelist at VictorOps, organizer of DevOpsDays - Rockies, author of “ChatOps for Dummies”, and host of a number of DevOps related events in the Denver/Boulder area. Jason has spent the last 8 months presenting and building content on a number of DevOps topics such as Blameless Post-mortems, ChatOps, and the value of context within incident management. A frequent speaker at DevOps events around the country, Jason enjoys talking to audiences large and small on a variety of technical and non-technical subjects.