Guiding IT Performance with OKRs

I recently read John Doerr’s book Measure What Matters on OKRs. (OKRs stands for Objectives and Key Results.) They’re part of a goal-setting process that sets goals with key results to measure your progress along the way. The process is simple, but not easy. (I highly recommend this talk on “simple but not easy” even though it’s about software engineering).

The book got me thinking about DevOps and how we constantly strive to measure and improve our own processes. Accelerate nails down the metrics behind IT performance. Pairing Accelerate’s metrics with OKRs creates a way for teams to move the needle. In this post, I want to discuss what OKRs would look like for improving IT performance metrics.

First, a Quick Recap

Accelerate boils IT performance down to four metrics: Lead Time, Deploy Frequency, MTTR, and Change Failure Rate. Lead time is how long it takes to go from development to production. Deployment frequency is how often changes are deployed to production. MTTR is the standard mean time to resolve production issues. Lastly, Change Failure Rate is the percentage of changes (i.e. deploys) that have negative consequences in production (e.g. a service outage or reduced functionality). OKRs provide a way forward regardless of where you are now. Let’s start with lead time.

Lead Time & Deployment Frequency

Accelerate’s study finds that high performance teams see lead times under an hour. Low- and mid-range teams see lead times of between once a week and once a month. In other words, the best teams deploy multiple times a day and far more than their low- and mid-range competition. Of course, it makes sense to try to improve lead times. Everyone would like to go faster, but what does that look like for teams deploying once a month? What are their objectives and key results for improving that metric?
Teams stuck in this rut are likely burdened by long-running branches, which makes things harder to integrate and test. They have CI and the tools to deploy quickly, but are stuck getting the code out the door faster. So the key results need to be about eliminating bottlenecks in the process before tackling tooling. Here’s an OKR sheet for improving lead time:

  • Objective: Double team velocity.
  • Key Result: Merge and integrate all topic branches within three working days.
  • Key Result: Identify and fast track small changes by the end of the quarter.

This example assumes there is some upfront understanding on what the bottlenecks are and what can be done to improve them. Note that each key result requires a number and a date. In other words, reading the objective followed by “as measured by” the key results should make sense.
Undoubtedly, focusing on dropping lead times (or improving velocity) requires working in smaller batches that are easier to develop, test, and deploy. Accelerate finds high-performance teams deploy on-demand while low and medium deploy between once a week and once a month. This means there’s a strong correlation between lead time and deployment frequency, which makes sense. Teams pushing things through development faster can naturally deploy faster as long as their pipelines are fast enough. Conversely, fast deployment pipelines don’t evolve without the back pressure from development. Identifying where your team in this loop is required to move out of it. Teams stuck with slow deployment pipelines may be blocked by technical issues, not necessarily people nor process issues. One set of OKRs can cover improving the deployment pipeline:

  • Objective: Continuously deliver valuable to our customers.
  • Key Result: Automate the promotion of QA, staging, and production by the end of the quarter.
  • Key Result: Reduce (longest step: example QA) time by 30% by end of the quarter.
  • Key Result: Parallelize test steps by end of the month

This isn’t to say that these are the only possible key results. The point is to illustrate what key results may look like for teams trying to improve their deployment frequency. Again, focus on bottlenecks. If you’re not addressing the bottleneck, then you’re not making progress. So ensure that your key results map to known and obvious bottlenecks. That will reduce lead time and increase deployment frequency which impacts the other two metrics.

MTTR and Change Failure Rate

Accelerate finds that high performance teams achieve MTTR numbers of less than an hour and change failure rates between 0 and 15%. The goal is to move your team closer to those numbers. This begins by understanding the technical practices employed by high performance teams. High performance teams leverage high-value telemetry, version control, automation, and continuous delivery to name a few. My guess is that teams have the most to gain from improving their telemetry and continuous delivery pipeline. I argue that telemetry is the most important practice to focus on because telemetry is ultimately responsible for telling the team if something is wrong or not. That’s why it’s a great place to start with a wonderful (and actionable) set of OKRs:

  • Objective: Gain real-time understanding into business and technical operations.
  • Key Result: Instrument all revenue generating flows by end of the quarter.
  • Key Result: Create real-time deploy dashboard with top 5 operational metrics by end of the month.
  • Key Result: Test telemetry to cover the last 5 production regressions by end of the quarter.
  • Key Result: Create page on spikes in customer support requests by Friday.

I find that teams have the most to gain by improving their telemetry. Not only does it make problems easier to identify and resolve, it also helps all engineers better understand the system and impact future engineering decisions. Once telemetry improves, then it’s possible to set OKRs for dropping change failure rate:

  • Objective: Become the most stable product our customer users.
  • Key Result: Use canary releases for production deployments by end of the month.
  • Key Result: Deploy 3 features behind a feature flipper by end of the quarter.
  • Key Result: Test the deployment pipeline covers the last 3 major production regressions by the end of month.

These examples are not exhaustive, but they are enough to frame an approach for improving your team’s performance. I think it’s effective because it makes decisions objective and focuses improvements on specific areas. More important, it clarifies goals and steers teams away from toiling away at minimal improvements.

What Are Your OKRs?

OKRs are a great model for achieving organizational goals and ultimately pushing them to stretch beyond what they think possible. In fact, John Doerr calls this one of OKR’s super powers. That’s why I’m excited Accelerate provides the metrics and technical practices to go with OKRs. If you’re unsure where to start, then choose one of the 4 metrics and associated technical practices. Set an objective to change the metric and key results for changing the relevant technical or organizational practices. Just keep in mind that each key result must be measured (remember “as measured by”), time bound, and hopefully tied to a bottleneck. Those three things will keep you on track to your objective and ultimately a more productive, confident, and happy team.

Related posts