This scenario might sound familiar: A critical system that is unstable but highly critical to the business. It’s running on old hardware and over time features and fixes got released making the code difficult to maintain. Traffic has increased and core components have become unreliable. It requires regular manual intervention to keep it running. It’s clearly a ticking bomb waiting to explode. Something needs to be done about this. This is our story of how when our team got ownership of such a system we embraced DevOps practices. It is the story of failures along the way, and why failing isn’t a bad thing if you are prepared to fail. It is also the story of how when the ticking bomb finally went off and we lost servers in a disastrous flooding event DevOps practices saved us and sped our recovery.