Progressive Application Delivery with Weave GitOps - automation with precision and control

February 08, 2022

Canary deployments or progressive delivery is a technique for deploying new features and fixes at speed while minimising risk. We all know that the best way to break any code is to deploy it into production; it worked just fine in staging....

Related posts

Continuous AWS Cloud Security with Trusted Delivery

Progressive Delivery: Towards Continuous Resilience with Flagger & Weave GitOps

Safety Fast with Weave GitOps Trusted & Progressive Delivery

Canary deployments or progressive delivery is a technique for deploying new features and fixes at speed while minimising risk. We all know that the best way to break any code is to deploy it into production; it worked just fine in staging. Rather than deploy a new release directly into production, it can be deployed in such a way that initially only a small percentage of requests get the new release until we’re sure it’s going to work. This is called progressive delivery or a canary release.

What’s software delivery have to do with canaries? Before the invention of electronic gas detectors, miners would take a canary down the mine with them. Being more susceptible to toxic gases the canary would pass out first. If your canary stopped signing, it was time to get to the surface. By exposing only a limited percentage of requests to a new software release, if there are any problems, only a small number of users will have been affected with only a small impact on the business. Move quickly and break little things.

Progressive delivery sounds like a great idea, why do so few organisations use it? Because it can be complex to set up and manage. However, if you use the right tools, it’s actually quite easy. Let’s see how.

Tooling Up

First of all you’ll need a Kubernetes cluster to play with. This can be one on your favourite cloud provider or just an instance of KinD running on your workstation. For the GitOps part Weave GitOps Core is the best tool for the job. The best breed of canary is Flagger, part of Flux. To manage the complexity of Kubernetes networking and perform the traffic shifting you’ll need LinkerD, a lightweight service mesh.

Service Mesh

LinkerD service mesh works by automatically injecting a proxy sidecar into each Pod. All traffic to and from your Pod goes through this proxy, allowing the service mesh to gather metrics and control the flow of requests. Each of these injected proxies is managed by the service mesh control plane.


LinkerD has a built-in web dashboard for visualising metrics collected from the automatically injected proxies.

linkerd


The Secret Ingredient

When Flagger is installed it creates a custom resource definition (CRD) of Canary. This resource replaces the usual Service definition that would partner a Deployment. Everything you need to do progressive delivery is in this one manifest; it’s not that complicated.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: frontend
spec:
  # rollback if not finished in time
  progressDeadlineSeconds: 360
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  service:
    port: 9898
  analysis:
    # How often the canary is evaluated
    interval: 1m
    threshold: 3
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      # duration for metric query
      interval: 1m
    - name: request-duration
      thresholdRange:
        # P99 request time (ms)
        max: 250
      # duration for metric query
      interval: 1m
    webhooks:
    - name: load-test
      type: rollout
      url: http://flagger-loadtest-loadtester.loadtest/
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://frontend-canary.podinfo:9898/"
    - name: events
      type: event
      url: http://webhook.default/webhook
      metadata:
        app: frontend

See, it’s not that scary. From the top the in the “spec” the “progressDeadlineSeconds” says that if this rollout has not completed in 360 seconds (6 minutes) then the whole thing gets rolled back. The “targetRef” is similar to what’s defined in a regular Service manifest, this Canary is targeting a Deployment called “frontend”. The “service.port” is the port number exposed.

Next up is the “analysis” section, this defines the success criteria for progression of the rollout. It will be evaluated every minute and a maximum of 3 failures will be accepted. Initially 10% of requests will get the new version, increasing to 50% before going all in and entirely replacing the previous version. For the new version to be considered successful 99% of requests must be good, not 5xx response codes. The 99th percentile response time must be below 250ms.

Finally the “webhooks” section provides some useful integration points. Flagger comes with its own load generation tool. If the canary version does not have enough traffic, remember in this example it’s only getting 10% of requests, to generate metrics for the analysis it will be considered a failure (fail safe). Using the webhook to generate some load directly on the canary version, notice the service name, will ensure that metrics are available for analysis. The “events” webhook is an example of generic integration, as Flagger manages the state of the canary it issues events which may be logged, used for alerting, etc.

Automation in Action

When the application is deployed via GitOps, Flagger works its magic automation.

$ kubectl get pod
backend-primary-6cbbc47f74-mkcbb    2/2     Running   0          98s
frontend-primary-7db7b6dd56-j8lfm   2/2     Running   0          98s

The Pods get created with a “primary” postfix, these are the current stable release.

$ kubectl get svc
backend            ClusterIP   10.96.44.51     <none>        9898/TCP
backend-canary     ClusterIP   10.96.160.186   <none>        9898/TCP
backend-primary    ClusterIP   10.96.164.244   <none>        9898/TCP
frontend           ClusterIP   10.96.162.202   <none>        9898/TCP
frontend-canary    ClusterIP   10.96.224.92    <none>        9898/TCP
frontend-primary   ClusterIP   10.96.66.20     <none>        9898/TCP

Additional Services are also created. The current stable release is serviced by the Services without a suffix and with the “primary” suffix. The “canary” suffix is not mapped.

This all changes during a canary release rollout. The Deployment manifest is edited and committed to Git, GitOps picks up the change and applies it to Kubernetes. Rather than the default rolling update, Flagger creates additional Pods, maps the Services and configures Linkerd proxies to manage the traffic.

$ kubectl get pod
backend-primary-6cbbc47f74-mkcbb    2/2     Running   0          18m
frontend-6c84db97bc-86ppq           2/2     Running   0          24s
frontend-primary-7db7b6dd56-j8lfm   2/2     Running   0          18m

During the canary rollout the new release is run in a separate Pod without a suffix. The “canary” suffix Services map to this and Linkerd splits the request traffic.

pdtrafficsplits


Give it a Try

You can watch a full walk through of this example on our YouTube channel. All the tooling used in this example is free open source and any decent workstation will be beefy enough to run everything. What are you waiting for? Give it a go today and see how easy it is to achieve DevOps nirvana with GitOps and progressive deployments.

Progressive delivery with Weave GitOps and Flagger


Related posts

Continuous AWS Cloud Security with Trusted Delivery

Progressive Delivery: Towards Continuous Resilience with Flagger & Weave GitOps

Safety Fast with Weave GitOps Trusted & Progressive Delivery

Progressive Application and Delivery - precision and control with Weave GitOps