GitOps for Cost Efficiency, Compliance, Velocity, Security, and Resilience

By bltd2a1894de5aec444
January 14, 2021

Many know that GitOps is a way for your engineering teams to reliably deliver applications and infrastructure to the cloud. But how can you measure its success and connect business outcomes like profitability and market share back to the capabilities of GitOps?

Related posts

Beginner’s Guide to Platform Engineering: Simplifying Software Delivery

SRE vs. GitOps vs. Platform Engineering: A Strategic Comparison

What Flux CD GA Means For You & Your Organization

Many have heard of GitOps and know that It’s a way for your engineering teams to reliably deliver applications and infrastructure to the cloud. But if you want to implement it or have already implemented it, how do you measure its success and connect business outcomes like profitability and market share back to the capabilities of GitOps? At a recent GitOps Days event, Cornelia Davis (@cdavisafc) delivered a talk on measuring the benefits of GitOps with velocity metrics identified by the DevOps Research and Assessment (DORA) report and she showed how those key metrics correlated with positive business outcomes.

What the best performing teams do well

The DORA report researchers (Jez Humble, Nicole Forsgren and Gene Kim) selected organizations of all sizes to analyze their DevOps practices. One of the results from this analysis is a strong correlation between certain metrics and how well teams performed. The elite and high performing teams consistently outperformed on the following metrics:

Deployment Frequency

This refers to the number of times teams can deploy to production or release to end users.

  • Elite - On demand and multiple times a day
  • High Performing - Once per day and once per week

Lead time for changes

The time it takes for your code to go from code complete to production.

  • Elite - Less than one day
  • High Performing - Between one day and one week

Time to restore service

The time it takes to restore your service when a major defect or outage occurs.

  • Elite - Less than an hour
  • High Performing - Less than one day

Change failure rate

The percentage of deployments to production that fail and need to be rolled back or that lead to a major system outage.

  • Elite - 0-15%
  • High Performing - 0-15%

Elite vs low performing teams and company bottom line

Comparing the elite performers with the low performers, the DORA team found that elite teams deployed 208 times more, 106 times faster, recovered from meltdowns 2,604 times faster and enjoyed a 7 times lower change failure rate. Companies with elite and high performing DevOps teams are the most profitable, whereas the those on the lower end of the performance scale are losing money and market share.

elite-performers.png

From: DORA report (2019)

These particular metrics illustrate how the best performing teams stay ahead, but as Cornelia points out, “Where’s the security in all of this?” Implicit in these metrics is control over the security so that developers have the freedom to deploy whenever they need to and more importantly recover quickly from a bad deployment. This is where GitOps comes into play, and not only in helping to achieve elite team metrics, but also in providing the guardrails and safety nets that allow engineering teams to achieve maximum velocity, securely.

How GitOps helps meet these goals

Before describing how GitOps can meet the DORA goals of elite teams, let’s review how GitOps works. GitOps starts with a versioned history of the declarative configurations in your system. An important point to keep in mind is that you can have multiple configuration repositories and it is not necessary for all of your configuration to be kept in a single mono-repository. Along with the configuration repositories, you’ll also have an image repository where new images that get generated from your CI pipelines will be deposited.

gitOps-workflow.png

With all declarative configuration stored in Git, you automatically gain a complete description of the desired state of your entire system. With the help of Kubernetes reconcilers and software agents, the running cluster can then detect when there is a drift from the desired state kept in Git. GitOps then provides a control loop that can always keep a running state up to date with the desired state stored in Git.

GitOps-controllers.png

GitOps helps teams gain elite and high performance status

By managing your systems through control loops with GitOps, you automatically gain a number of benefits that easily meets the metrics attained by elite and high performing teams and can result in an increase in business profitability and market share:

Stronger security guarantees for optimal guardrails

Git’s strong correctness and security guarantees, backed by strong cryptography used to track and manage changes, as well as the ability to SSH sign changes to prove authorship and origin are key to a correct and secure definition of the cluster’s desired state. If a security breach does occur, the immutable and auditable source of truth can be used to recreate a new system independently of the compromised one, reducing downtime and allowing for a much better incident response and more effective disaster recovery to meet compliance.

Also, the separation of responsibility between integrating and testing software, then releasing it to a production environment embodies the security principle of least privilege, reducing the impact of compromise and providing a smaller attack surface.

Increased speed and productivity

Continuous deployment automation with an integrated feedback and control loop speeds up your mean time to deployment by supporting more frequent releases. Declarative definitions kept in Git enable developers to use familiar workflows, reducing the time it takes to spin up new development or test environments to deploy new features. Your teams can ship more changes per day and this translates into faster turnaround for new features and functionality to the customer.

Reduced mean time to recovery

The amount of time it takes to recover from a cluster meltdown is also decreased with GitOps best practices. With Git’s built in capability to revert/rollback and fork, you gain stable and reproducible rollbacks. Since your entire system is described in Git, you have a single source of truth for what to recover after a cluster failure, reducing your Meantime to Recovery (MTTR) from hours or days to minutes. GitOps provides real time feedback and control loops. In conjunction with other tools like Prometheus for observability and Jaeger for end-to-end tracing, problems can be detected and tracked down, preventing entire cluster meltdowns more quickly, and overall reducing mean time to detect (MTTD) and mean time to locate (MTTL) and decreasing change failure rates.

Improved stability and reliability for greater availability

Due to GitOps providing a single operating model for making infrastructure and apps, you have consistent end-to-end workflows across your entire organization. Not only are your continuous integration and continuous deployment pipelines all driven by pull requests, but your operations tasks are also fully reproducible through Git.

Easier compliance and auditing

By incorporating Git as part of your cluster management strategy, you automatically gain a convenient audit trail of who did what and when for all cluster changes outside of Kubernetes that can be used to meet SOC 2 compliance and also ensure stability.

View the talk in its entirety:


Related posts

Beginner’s Guide to Platform Engineering: Simplifying Software Delivery

SRE vs. GitOps vs. Platform Engineering: A Strategic Comparison

What Flux CD GA Means For You & Your Organization

Go from Zero to Gitops with our comprehensive discovery, design and deploy package for Kubernetes