Mechanisms, Not Miracles: What AWS Ops Reviews Taught Me About Engineering Leadership
Please Hold As I Vomit Before Walking Into OpsMetrics
Two weeks after I became a GM at Amazon, my service failed so badly it delayed package deliveries across the country. Trucks sat idle at loading docks. Fulfillment centers jammed. I felt sick. We wrote a 39-page post-mortem that tore apart everything—architecture, process, even team culture. And then, I walked into the weekly AWS ops review to defend it live, in front of hundreds of engineers and thousands more watching remotely.
In the center of the room, the senior-most leadership of AWS would assemble: Raju Gulabani, Alec Peterson, Charlie Bell, and foundational distinguished engineers like Colm MacCárthaigh. Basically, the deepest cloud engineering talent in the world.
What followed wasn’t a performance—it was a reckoning.
This is what it was like to live inside Amazon’s operational culture under Charlie Bell, where failure wasn’t punished, but ownership was demanded, leadership principles weren’t quoted—they were enforced, and metrics mattered only if they reflected real customer pain. Here's what every engineering leader should take from it.
Editor’s Note: If you subscribe to my channel, you will be able to see how Amazon documents the anatomy of a service failure and the type of language used.
Keep reading with a 7-day free trial
Subscribe to Wired for Scale: Sid Rao's Musings to keep reading this post and get 7 days of free access to the full post archives.