In the nearly three years since Alexis Richardson coined the term, GitOps has demonstrated real value for a multitude of organizations, large and small. While there is some variability in what is meant by the term, one particular use case is ubiquitous - using GitOps to manage the delivery and operations of applications that are deployed to Kubernetes. This was and still is an excellent place to start any GitOps journey, however, GitOps is useful in many more scenarios.
With this post, I hope to expand your thinking around GitOps by first exploring why it took off for that app deployment use case, using those insights to perhaps broaden the definition of GitOps for you, and then suggest a few other areas where you might apply these patterns.
The birth of cloud-native operations
Of the many things that Kubernetes has brought to the IT industry, perhaps the most impactful has been an appreciation for declarative APIs and reconciliation. Initially those ideas were applied to application deployments where yaml files described a desired application topology and provided to Kubernetes. Reconciliation loops for deployments, replica sets and services created and kept stable deployments to match what was expressed in those yaml files.
With developers and DevOps engineers increasingly taking on the responsibility for operating apps, it wasn’t a stretch for them to start storing those application configurations in Git, given they were already using it for their application source. The last step, which is what Weaveworks added when we created Flux, was to deliver the configuration stored in Git to the Kubernetes cluster where the app was to be run. At this point, the four essential elements for GitOps were in place:
#1 The entire system is described declaratively (the yaml)
#2 The desired system state is versioned in Git (Git)
#3 Approved changes can be automatically applied to the system (Flux)
#4 Software agents to ensure correctness and act on divergence (Kubernetes reconcilers)
GitOps - a new operational model for modern systems
It’s important to understand that #3 on this list is not “the GitOps”, rather, we took three existing and very successful things, added one more thing (Flux) and created something entirely new. The proverbial 3 + 1 >> 4.
Taking a step back to look at what we created reveals something very interesting - what we refer to with the term “GitOps” is something quite significant: it’s a new operational model designed for modern systems. I like to call this cloud-native operations.
Without going into all of the details (which could consume one or more blog posts on their own), the far better understood concept of cloud-native software offers some insight on what I mean by the term - cloud-native operations. The definition I use is: cloud-native software are applications designed for highly distributed systems that are experiencing constant change. We need only add a single phrase to get to the new definition:
Cloud-native operations is a set of practices that allow us to manage highly distributed software that is experiencing constant change.
Gone are the days when it was okay to go through a six week process to promote something into production - the change is just too frequent to support that. Gone are the days when the infrastructure our software was running on could be relied on to remain rigidly stable - there are just too many, highly distributed moving parts to allow that to happen.
At this stage it’s well understood that the software we build and the way we operate it has to be far different from what we’ve done for the last several decades. Now that we are thinking of GitOps as a modern operational model, it’s not too much of a stretch to start thinking about it applying far more broadly than just to applications being deployed to Kubernetes.
Non-Kubernetes targets - but with reconciliation
For example, could the GitOps approach be used to manage the very Kubernetes platforms that applications are being deployed to? The answer is a resounding “yes” - in fact, at Weaveworks we are doing just that, both in the open source (see WKSCTL) as well as for our customers.. Given the success we’d already had applying GitOps to Kubernetes targeted application deployments, we came to understand the significant value in the following:
- Git logs and stores complete, versioned and immutable representations of the desired state of a system (GitOps principle #2)
- The reconciliation-based manner in which Flux delivers versions of desired state configurations to a runtime (GitOps principle #3)
What we needed to add in order to form a complete GitOps solution for Kubernetes cluster management was the declarative expression of a Kubernetes cluster (GitOps principle #1) and the software agents that keep the actual cluster state in alignment with the desired state (GitOps principle #4). Fortunately, the Kubernetes community was already on the way to addressing these two needs with the formation of cluster-api or CAPI.
Cluster platform management via GitOps
This emerging CAPI standard allows for a Kubernetes cluster, and optionally the compute infrastructure it will run on, to be described in yaml files. Reconcilers will deliver those clusters by (optionally) provisioning said infrastructure, installing Kubernetes onto those compute nodes and then connecting them together to form a cluster. There’s a bit of trickery involved in bootstrapping the cluster-api that I won’t go into here, but what cluster-api has just delivered are exactly the two pieces we needed to be able to do GitOps for complete Kubernetes cluster management.
Now, there is something somewhat subtle, but very important that I want to draw your attention to: notice that the target of the deployment is not Kubernetes, rather, it is the infrastructure on which the Kubernetes cluster will run. To be concrete, this solution (#1) specifies the desired Kubernetes cluster in yaml files, (#2) that are stored in Git, (#3) delivered using Flux, and (#4) the cluster-api controllers deploy those clusters to EC2, Azure Virtual Machines, Google Compute Engine, vSphere, bare metal or other compute infrastructures.
Now that you realize this isn’t just for things that are being deployed to Kubernetes, I’m betting you are starting to think about GitOps-ing many more things that you need to operate in your environments.
Non-Git Sources - but Git semantics
But it’s not just what you are targeting with GitOps where I want you to broaden your thinking. Sources other than Git can also be a part of the GitOps process. That is, cloud-native operations do not require that every single thing be stored in Git.
We even see this in the original use case for GitOps - application deployments - where a second source is already in play: the image repository. A declarative configuration for an application deployment (yaml) includes references to the container images that will run within pods, and the reconcilers in Kubernetes follow those references and deploy the right images. That is, there are two sources that the GitOps process is drawing from - Git for the application configuration and an image repository for the container images.
But there is something more subtle, but critically important that is required for GitOps. We see in GitOps principle #2: the desired system state is versioned in Git. While I am suggesting that the “Git” part of this statement is optional, the “versioned” part is not. In order to support the GitOps process, the sources we draw from must have an immutable version history. The reason that we favor Git for application configurations is that these semantics are baked into Git - the version of a configuration stored in Git is immutable, and all versions of a configuration are stored in an immutable log. Provided these two semantics are available with a store, that store is a fine source for a GitOps workflow.
Are image registries really immutable?
As an astute reader and Kubernetes aficionado, you might be thinking you’ve already found a flaw in my logic - image repositories don’t have those characteristics. I can replace an image in docker hub, for example, with relative ease, thereby not only breaking the immutability of the image version but also effectively eliminating any history of the images that came before (side effects are almost always bad!). And you would be right! But we can solve this problem by instilling some discipline into the way that image repositories, and by extension, other stores, are used. For image repositories, for example:
- images should only be stored with unique tags - a best practice is to use the
shaof the source code version that is built into the image (never, ever use
- images stored in your production repository should never be deleted.
To achieve this discipline, encode these practices into the CI pipelines that produce images and into the access control settings of your image repository. So, even while the image repository does not have the needed semantics baked in, we can deliver on those requirements through our disciplined use of the store.
Now that you understand a bit more about the (Git) semantics needed for GitOps sources, I’m betting you are starting to think about how other enterprise operational stores can be leveraged in your GitOps workflows.
We’re Just Getting Started
Just as container orchestration for application deployments was only use case #1 for Kubernetes, so too was application management on Kubernetes only use case #1 for GitOps. GitOps enthusiasts are embracing the ideas outlined herein to generate operational efficiencies that directly contribute business value. Please tune in to hear many compelling stories during GitOps Days, May 20 & 21, 2020 (or watch the videos if you missed it) and do let us know if we can help you on your GitOps journey.