What is GitOps?
Pioneered in 2017, GitOps is a way to do Kubernetes cluster management and application delivery. It works by using Git as a single source of truth for declarative infrastructure and applications. With GitOps, the use of software agents can alert on any divergence between Git with what's running in a cluster, and if there's a difference, Kubernetes reconcilers automatically update or rollback the cluster depending on the case. With Git at the center of your delivery pipelines, developers use familiar tools to make pull requests to accelerate and simplify both application deployments and operations tasks to Kubernetes.
An operating model for building cloud native applications
GitOps can be summarized as these two things:
- An operating model for Kubernetes and other cloud native technologies, providing a set of best practices that unify deployment, management and monitoring for containerized clusters and applications.
- A path towards a developer experience for managing applications; where end-to-end CICD pipelines and Git workflows are applied to both operations, and development.
Principles of GitOps
To start managing your cluster with GitOps workflows, the following must be in place:
#1. The entire system described declaratively.
Kubernetes is just one example of many modern cloud native tools that are “declarative” and that can be treated as code. Declarative means that configuration is guaranteed by a set of facts instead of by a set of instructions. With your application’s declarations versioned in Git, you have a single source of truth. Your apps can then be easily deployed and rolled back to and from Kubernetes. And even more importantly, when disaster strikes, your cluster’s infrastructure can also be dependably and quickly reproduced.
#2. The canonical desired system state versioned in Git.
With the declaration of your system stored in a version control system, and serving as your canonical source of truth, you have a single place from which everything is derived and driven. This trivializes rollbacks; where you can use a `Git revert` to go back to your previous application state. With Git’s excellent security guarantees, you can also use your SSH key to sign commits that enforce strong security guarantees about the authorship and provenance of your code.
#3. Approved changes that can be automatically applied to the system.
Once you have the declared state kept in Git, the next step is to allow any changes to that state to be automatically applied to your system. What's significant about this is that you don't need cluster credentials to make a change to your system. With GitOps, there is a segregated environment of which the state definition lives outside. This allows you to separate what you do and how you're going to do it.
#4. Software agents to ensure correctness and alert on divergence.
Once the state of your system is declared and kept under version control, software agents can inform you whenever reality doesn’t match your expectations. The use of agents also ensures that your entire system is self-healing. And by self-healing, we don’t just mean when nodes or pods fail—those are handled by Kubernetes—but in a broader sense, like in the case of human error. In this case, software agents act as the feedback and control loop for your operations.
Key benefits of GitOps
Automated delivery pipelines roll out changes to your infrastructure when changes are made to Git. But the idea of GitOps goes further than that – it uses tools to compare the actual production state of your whole application with what’s under source control and then it tells you when your cluster doesn’t match the real world.
By applying GitOps best practices, there is a ‘source of truth’ for both your infrastructure and application code, allowing development teams to increase velocity and improve system reliability.
The benefits of applying GitOps best practices are far reaching and provide:
- Increased Productivity
Continuous deployment automation with an integrated feedback control loop speeds up Mean Time to Deployment. Your team can ship 30-100 times more changes per day, increasing overall development output 2-3 times.
- Enhanced Developer Experience
Push code and not containers. Developers can use familiar tools like Git to manage updates and features to Kubernetes more rapidly without having to know the internal of Kubernetes. Newly on-boarded developers can get quickly up to speed and be productive within days instead of months.
- Improved Stability
When you use Git workflows to manage your cluster, you automatically gain a convenient audit log of all cluster changes outside of Kubernetes. An audit trail of who did what, and when to your cluster can be used to meet SOC 2 compliance and ensure stability.
- Higher Reliability
With Git’s capability to revert/rollback and fork, you gain stable and reproducible rollbacks. Because your entire system is described in Git, you also have a single source of truth from which to recover after a meltdown, reducing your meantime to recovery (MTTR) from hours to minutes.
- Consistency and Standardization
Because GitOps provides one model for making infrastructure, apps and Kubernetes add-on changes, you have consistent end-to-end workflows across your entire organization. Not only are your continuous integration and continuous deployment pipelines all driven by pull request, but your operations tasks are also fully reproducible through Git.
- Stronger Security Guarantees
Git’s strong correctness and security guarantees, backed by the strong cryptography used to track and manage changes, as well as the ability to sign changes to prove authorship and origin is key to a secure definition of the desired state of the cluster.
GitOps is Continuous Delivery meets Cloud Native
GitOps builds and iterates on ideas drawn from DevOps and Site Reliability Engineering, that started with Martin Fowler’s comprehensive Continuous Integration overview in 2006.
Freedom to choose the tools you need
As a workflow for CI/CD pipelines GitOps has been described as the holy grail of development processes. Because there is no single tool that can do everything required in your CICD pipeline, GitOps gives you the freedom to choose the best tools for the different parts. You can select a set of tools from the open source ecosystem or from closed source or depending on your use case, you may even combine them. The most difficult part of creating a CICD pipeline is gluing all of the pieces together.
Whatever you choose for your delivery pipeline, applying GitOps best practises with Git (or any version control) should be as an integral component of your process. Doing so will make the transition to continuous delivery easier. This is true not only from a technical point of view but also from a cultural perspective.
GitOps: versioned CI/CD on top of declarative infrastructure. Stop scripting and start shipping. https://t.co/SgUlHgNrnY — Kelsey Hightower (@kelseyhightower) January 17, 2018
Git enables infrastructure as code (IAC) tools
Kubernetes is just one example of many modern cloud native tools that are “declarative” and that can be treated as code. Declarative means that configuration is guaranteed by a set of facts instead of by a set of instructions, for example, “there are ten redis servers”, rather than “start ten redis servers, and tell me if it worked or not”.
With declarative tools, your entire set of configuration files can be version controlled in Git. By using Git as the source of truth your apps more easily deployed and rolled back to and from Kubernetes. And even more importantly, when disaster strikes, your cluster’s infrastructure can be dependably and quickly reproduced all from Git.
IAC tools vs. GitOps
Infrastructure as Code tools that can provision servers on demand have existed for quite some time. These tools originated the concept of keeping infrastructure config versioned, backed up and reproducible from source control.
But now with Kubernetes being almost completely declarative, combined with the immutable container, it is possible to extend some of these concepts to managing applications and their operating system as well.
The ability to manage and compare the current state of both your infrastructure, and your applications so that you can test, deploy, rollback, rollforward with a complete audit trail all from Git is what encompasses the GitOps philosophy and its best practices. This is possible because Kubernetes is managed almost entirely through declarative config and because containers are immutable.
At Weaveworks, we use Terraform and Ansible to provision servers. We also keep those configuration files backed up and versioned in Git. IAC tools and their associated configuration files form a central part of our GitOps workflows for near-instant cluster recovery if disaster strikes here at Weaveworks. Learn more about how infrastructure as code tools differ from GitOps in the GitOps FAQ.
What if my system diverges from the source of truth?
Declarative provisioning tools let you describe your desired true state in Git. But they suffer from the problem that what is “really true right now” is in the live system, and that may differ from what is described in source control.
- How do you know if the live system has converged to the desired state?
- Can you get notified when this differs?
- What is the “canary in the coal mine” that informs you when you’re in trouble?
- How do you trigger convergence between the cluster and source control?
There is prior art here.
IAC tools like Chef, Puppet and Ansible support features like “diff alerts”. These help operators to understand when action may need to be taken to “converge” the live system to the intended state (as defined by the configuration scripts). And more recently, best practice is to deploy immutable images (eg. containers) so that divergence is less likely.
In the “GitOps” model, we use Git to solve for divergence and convergence, aided by a set of “diff” and “sync” tools (kubediff, as well as terradiff and ansiblediff) that compare the intended state with actual state.
GitOps builds on immutable infrastructure
GitOps takes full advantage of the move towards immutable infrastructure and declarative container orchestration. We manage multiple deployments a day at Weaveworks. In order to minimize the risk of change after a deployment, whether intended or by accident via “configuration drift” it is essential that we maintain a reproducible and reliable deployment process.
Our whole system’s desired state (aka “the source of truth”) is described in Git. We use containers for immutability as well as different cloud native tools like Terraform and Ansible to automate and manage our configuration. These tools together with containers and declarative nature of Kubernetes provide what we need for a complete recovery in the case of an entire meltdown.
Works with IAC tools
When you apply GitOps principles to “everything”, including machine configuration, applications and services in addition to alerting rules and dashboards, all are kept under source control.
Access to the running system should not be required except via Git. Any group of changes may be applied atomically, and diffed accordingly. The Git record is then not just an audit log but also a transaction log that you can use to roll back and forth to any snapshot.
Continuous Delivery and GitOps Workflows in Weave Cloud
Weave Cloud is designed specifically for version controlled systems and declarative application stacks. Every developer on your team is likely familiar with Git and can make pull requests. Now they can use Git to accelerate and simplify application deployments to Kubernetes as well.
Here is a typical developer workflow for creating or updating a new feature:
- A pull request for a new feature is pushed to GitHub for review.
- The code is reviewed and approved by a colleague. After the code is revised, and re-approved it is merged to Git.
- The Git merge triggers the CI and build pipeline, runs a series of tests and then eventually builds a new image and deposits to the new image to a registry.
- The Weave Cloud ‘Deployment Automator’ watches the image registry, notices the image, pulls the new image from the registry and updates its YAML in the config repo.
- The Weave Cloud ‘Deployment Synchronizer’ (installed to the cluster), detects that the cluster is out of date. It pulls the changed manifests from the config repo and deploys the new feature to production.
GitOps enabled CICD pipeline:
Kubernetes controller implemented with the operator pattern
Weave Cloud implements a custom controller to listen for and synchronize deployments to your Kubernetes cluster. The controller is implemented using the operator pattern which is significant on two levels; first, it is more secure, and; secondly, it automates complex error prone tasks like having to manually update YAML manifests.
By using the operator pattern, an agent acts on behalf of the cluster to listen to events relating to custom resource changes, so that they can be applied. The agent is responsible for synchronizing what’s in Git with what’s running in the cluster and provides a simple way for your team to achieve continuous deployment.
Pull vs Push Pipeline
Most CI/CD tools available today use a push-based model. A push-based pipeline means that code starts with the CI system and may continue its path through a series of encoded scripts or uses ‘kubectl’ by hand to push any changes to the Kubernetes cluster.
The reason you don’t want to use your CI system as the deployment impetus or do it manually on the command line is because of the potential to expose credentials outside of your cluster. While it is possible to secure both your CI/CD scripts and the command line, you are working outside the trust domain of your cluster. This is generally not good practice and is why CI systems can be known as attack vectors for production.
Typical push pipeline with read/write permission outside of the cluster:
In Weave Cloud images are pulled and credentials are kept inside the cluster:
Weave Cloud Pull Pipeline
Weave Cloud uses a pull strategy that consists of two key components: a “Deployment Automator” that watches the image registry and a “Deployment Synchronizer” that sits in the cluster to maintain its state.
At the centre of our pull pipeline pattern is a single source of truth for manifests (or a config repo). Developers push their updated code to the code base repository; where the change is picked up by the CI tool and ultimately builds a Docker image. The Weave Cloud ‘Deployment Automator’ notices the image, pulls the new image from the repository and then updates its YAML in the config repo. The Deployment Synchronizer, then detects that the cluster is out of date, and it pulls the changed manifests from the config repo and deploys the new image to the cluster.
Weave Cloud deployment agent installed to your cluster
With the Deployment synchronizer inside of the cluster, your cluster credentials are not exposed outside of your production environment. Once the Weave Cloud agents are installed to your cluster and your Git repo is connected, any changes in your production environment are done via Git pull requests with full rollbacks as well as convenient audit logs all provided by Git.
Observability as a deployment catalyst
With Kubernetes, GitOps can manage infrastructure and app deployments through pull-requests. But how do GitOps workflows and observability work together?
By combining GitOps workflows with real-time observability, your development team can make crucial decisions before they deploy any new features. Because about to be released services can be observed in real-time within the running cluster before you release, it means that you can deploy with confidence and deliver better quality features more quickly.
Observability can be seen as one of the principal drivers of the Continuous Delivery cycle for Kubernetes since it describes the actual running state of the system at any given time. The running system is observed in order to understand and control it. New features and fixes are pushed to git and trigger the deployment pipeline, and when ready to be released can be observed in real-time against the running cluster. At this point, the developer may return to the beginning of the pipeline based on this feedback or deploy and release the image to the production cluster.
GitOps is a release oriented model of both operations and features. How quickly you deliver new features to your customers, depends in part on how fast your team can go round the stages in this cycle.
Developers that use GitOps workflows and observability together need to answer these questions:
- If a change is released automatically how do we know it really worked?
- How can we be sure that our changes are actually driving improvement?
- In a complex distributed system how do we understand issues, diagnose them and handle incidents?
With Weave Cloud, observability workload dashboards are integrated into the deployment and release process. At a glance you can see right away if your deployment will be successful before you commit to releasing it to staging or production. This not only helps you identify problems faster but because observability workload dashboards are real-time and are built right into the deployment process, you can confidently deploy your service multiple times per day and be confident that the deployment is free from major defects.
Benefits of GitOps
By adopting GitOps best practices developers use familiar tools like Git to manage updates and features to Kubernetes more rapidly. By continuously pushing feature updates, businesses are more agile, can respond more quickly to customer demands, and are more competitive in the marketplace.
With GitOps you have a complete end to end pipeline. Not only are your continuous integrations and continuous deployment pipelines all driven by pull request, but your operations tasks are also fully reproducible through Git.
If you are using Weave Cloud, deployments to your running cluster are also made securely without leaking sensitive credentials outside of the cluster.
Stronger security guarantees
Git’s strong correctness and security guarantees, backed by the strong cryptography used to track and manage changes, as well as the ability to sign changes to prove authorship and origin is key to a correct and secure definition of desired state of the cluster. If a security breach does occur, the immutable and auditable source of truth can be used to recreate a new system independently of the compromised one, reducing downtime and allowing much better incident response.
Separation of responsibility between packaging software and releasing it to a production environment also embodies the security principle of least privilege, reducing the impact of compromise and providing a smaller attack surface.
Easier compliance and auditing
Since changes are tracked and logged in a secure manner, compliance and auditing are made trivial. The use of comparison tools like kubediff, terradiff and ansiblediff also allow you to compare a trusted definition of the state of the cluster with the actual running cluster, ensuring that the tracked and auditable changes match reality.
Weaveworks is the creator of Weave Cloud, a SaaS that simplifies deployment monitoring and management for containers and microservices. It extends and complements popular orchestrators, and allows developers and DevOps to make faster deployments, insightful monitoring, visualization and networking.
We use our own product to deploy and release new features to Weave Cloud. In addition to this, we are AWS & GCP technical partners; major contributors to the Kubernetes Open Source project; originators of the Kubernetes on AWS SIG; and also key members of the SIG Cluster Lifecycle.
For the past 3 years, Kubernetes has been powering Weave Cloud, our operations as a service offering, so we couldn’t be more excited to share our knowledge and help teams embrace the benefits of cloud native tooling and git-based workflows.
Contact us for more details on our Kubernetes support packages.