We were looking for a better way to represent the experience of our clients and the reliability of our site. We found that the traditional uptime metrics did not work well for us, since they did not account for the satisfaction of our client base. Traditional uptime was also vaguely defined, with no consensus over which component should be broken in which way for which duration of time, in order to qualify as downtime. We did some research and decided to adopt the error budgeting model. This talk will be about our journey to implement it for our engineering team here at GroupBy.
We will discuss:
