At the recent Cloud Native Transformation summit sponsored by Sysdig, Bryan Boreham, Weaveworks engineer gave a talk on “Automating Kubernetes with GitOps”.
The term GitOps was originally coined by Alexis Richardson, CEO of Weaveworks, and according to Bryan, Alexis came up with it to capture what the Weaveworks engineering team was already doing on a daily basis.
The goal of GitOps is to increase the velocity of your team’s output. It automates much of the process of Kubernetes releases and deployments. Put simply, GitOps is a way to avoid the use of custom scripts and instead it automates your Kubernetes deployments so that you can safely ship code on a daily basis.
GitOps embraces these four principles:
- Describe your entire system declaratively.
- Version-control the desired system state.
- Apply changes to the desired state as version-controlled commits.
- The use of software agents to synchronize and alert on divergence.
What is the Cloud-Native Transformation?
Bryan described what it took for companies to scale before the cloud. In the past, scaling your system required you to order new servers. In a traditional datacenter, you would have waited for about a month to get new servers, and now with the cloud, it takes about two minutes to spin up a new one.
But how long does it take teams to deliver a software change?
According to Forester, most businesses are releasing on a monthly basis. But only 20% of companies surveyed say they are releasing faster than once a month.
What slows down releases?
There are many reasons why people are slow to release. The diagram below illustrates the average release approval process. It is the approval process, according to Bryan that is the reason why many teams take a very long to release a new feature.
Weaveworks recovered from disaster in 45 minutes
At Weaveworks, we run a SaaS called Weave Cloud. Several years ago one of our engineers, mistyped something on the command line, and in the process, he accidentally deleted all of our clusters. But because we were using GitOps way back then, we were able to bring everything back up in 45 minutes.
How did we do that?
Because our entire system is described declaratively in Kubernetes, Docker, and Terraform, our system can be easily version controlled in Git. Our entire system including code, configuration, monitoring rules and their dashboards is all kept under source control with a built-in and full audit trail that Git provides.
“With our entire system in Git, we have a single source of truth for when things go wrong.” - Bryan Boreham, Engineer at Weaveworks
And so, when all of our servers went down, we were able to reapply the configuration we had in Git and bring up new servers, install the software, and then reapply all of the manifests for Kubernetes. All of our pods came up with the last known best state and in 45 minutes we’re up and running again. Since every git commit is an audit trail, it can provide a simple way for you to check for the ‘good state’ of your cluster.
With our source of truth in Git, we also have an agent running in the cluster that compares the state of the running cluster with what’s in Git. As an example, a developer can ssh onto the cluster and change something, and if this happens, the cluster will notice the change, and send an alert that the cluster no longer matches with what’s in Git.
Things to keep under version control are your YAML files that describe Kubernetes and how your deployment runs in it. Superior to keeping your configuration in a database, YAMLs kept versioned in Git come with a built-in audit trail of who did what when.
GitOps for Configuration
With all of your manifest files in Git, you also need something to synchronize these with the production cluster. Our open source product, Flux is the agent that can be used for this type of synchronization.
GitOps for Continuous Deployment
To implement continuous deployment you need to add an image repo, like Docker Hub to your build pipeline. In addition, you might need continuous integration where the change makes it all the way from your running system to the image repo. Whenever a new image appears in the image repo, the version will be automatically updated in the deployment manifest and then checked back into Git. And when that occurs, the source of truth will differ from the running cluster, where an automatic update can occur. In our case, we update to staging before we deploy to production.
Weaveworks includes an approval process between staging and production. But we do use the same configuration repo for both environments. You can use any process here though, but the main idea is that changes to the cluster are driven by a Git commit.
Watch this entertaining talk in its entirety here: