Progressive Application Delivery with Weave GitOps - automation with precision and control
Canary deployments or progressive delivery is a technique for deploying new features and fixes at speed while minimising risk. We all know that the best way to break any code is to deploy it into production; it worked just fine in staging....
Introducing GitOps Run: Create secure developer environments faster than ever
Build and Manage a Self-Service Developer Platform with Weave GitOps
One Year of Weave GitOps Terraform Controller: Bringing True GitOps to Infrastructure as Code
Canary deployments or progressive delivery is a technique for deploying new features and fixes at speed while minimising risk. We all know that the best way to break any code is to deploy it into production; it worked just fine in staging. Rather than deploy a new release directly into production, it can be deployed in such a way that initially only a small percentage of requests get the new release until we’re sure it’s going to work. This is called progressive delivery or a canary release.
What’s software delivery have to do with canaries? Before the invention of electronic gas detectors, miners would take a canary down the mine with them. Being more susceptible to toxic gases the canary would pass out first. If your canary stopped signing, it was time to get to the surface. By exposing only a limited percentage of requests to a new software release, if there are any problems, only a small number of users will have been affected with only a small impact on the business. Move quickly and break little things.
Progressive application delivery sounds like a great idea, why do so few organisations use it? Because it can be complex to set up and manage. However, if you use the right tools, it’s actually quite easy. Let’s see how.
First of all you’ll need a Kubernetes cluster to play with. This can be one on your favourite cloud provider or just an instance of KinD running on your workstation. For the GitOps part Weave GitOps Core is the best tool for the job. The best breed of canary is Flagger, part of Flux. To manage the complexity of Kubernetes networking and perform the traffic shifting you’ll need LinkerD, a lightweight service mesh.
LinkerD service mesh works by automatically injecting a proxy sidecar into each Pod. All traffic to and from your Pod goes through this proxy, allowing the service mesh to gather metrics and control the flow of requests. Each of these injected proxies is managed by the service mesh control plane.
LinkerD has a built-in web dashboard for visualising metrics collected from the automatically injected proxies.
The Secret Ingredient
When Flagger is installed it creates a custom resource definition (CRD) of Canary. This resource replaces the usual Service definition that would partner a Deployment. Everything you need to do progressive delivery is in this one manifest; it’s not that complicated.
# rollback if not finished in time
# How often the canary is evaluated
- name: request-success-rate
# duration for metric query
- name: request-duration
# P99 request time (ms)
# duration for metric query
- name: load-test
cmd: "hey -z 1m -q 10 -c 2 http://frontend-canary.podinfo:9898/"
- name: events
See, it’s not that scary. From the top the in the “spec” the “progressDeadlineSeconds” says that if this rollout has not completed in 360 seconds (6 minutes) then the whole thing gets rolled back. The “targetRef” is similar to what’s defined in a regular Service manifest, this Canary is targeting a Deployment called “frontend”. The “service.port” is the port number exposed.
Next up is the “analysis” section, this defines the success criteria for progression of the rollout. It will be evaluated every minute and a maximum of 3 failures will be accepted. Initially 10% of requests will get the new version, increasing to 50% before going all in and entirely replacing the previous version. For the new version to be considered successful 99% of requests must be good, not 5xx response codes. The 99th percentile response time must be below 250ms.
Finally the “webhooks” section provides some useful integration points. Flagger comes with its own load generation tool. If the canary version does not have enough traffic, remember in this example it’s only getting 10% of requests, to generate metrics for the analysis it will be considered a failure (fail safe). Using the webhook to generate some load directly on the canary version, notice the service name, will ensure that metrics are available for analysis. The “events” webhook is an example of generic integration, as Flagger manages the state of the canary it issues events which may be logged, used for alerting, etc.
Automation in Action
When the application is deployed via GitOps, Flagger works its magic automation.
$ kubectl get pod
backend-primary-6cbbc47f74-mkcbb 2/2 Running 0 98s
frontend-primary-7db7b6dd56-j8lfm 2/2 Running 0 98s
The Pods get created with a “primary” postfix, these are the current stable release.
$ kubectl get svc
backend ClusterIP 10.96.44.51 <none> 9898/TCP
backend-canary ClusterIP 10.96.160.186 <none> 9898/TCP
backend-primary ClusterIP 10.96.164.244 <none> 9898/TCP
frontend ClusterIP 10.96.162.202 <none> 9898/TCP
frontend-canary ClusterIP 10.96.224.92 <none> 9898/TCP
frontend-primary ClusterIP 10.96.66.20 <none> 9898/TCP
Additional Services are also created. The current stable release is serviced by the Services without a suffix and with the “primary” suffix. The “canary” suffix is not mapped.
This all changes during a canary release rollout. The Deployment manifest is edited and committed to Git, GitOps picks up the change and applies it to Kubernetes. Rather than the default rolling update, Flagger creates additional Pods, maps the Services and configures Linkerd proxies to manage the traffic.
$ kubectl get pod
backend-primary-6cbbc47f74-mkcbb 2/2 Running 0 18m
frontend-6c84db97bc-86ppq 2/2 Running 0 24s
frontend-primary-7db7b6dd56-j8lfm 2/2 Running 0 18m
During the canary rollout the new release is run in a separate Pod without a suffix. The “canary” suffix Services map to this and Linkerd splits the request traffic.
Give it a Try
You can watch a full walk through of this example on our YouTube channel. All the tooling used in this example is free open source and any decent workstation will be beefy enough to run everything. What are you waiting for? Give it a go today and see how easy it is to achieve DevOps nirvana with GitOps and progressive deployments.