GitOps and Cluster API: Multi-cluster Manager

By Richard Case
January 23, 2020

In a previous post we introduced the combination of Flux and the Cluster API to enable cluster management with GitOps. But what happens if you are building a large multi-cloud and/or hybrid platform?This post introduces an advanced deployment pattern for managing multiple clusters.

Related posts

Manage Thousands of Clusters with GitOps and the Cluster API

WKSctl - A New OSS Kubernetes Manager using GitOps

Weave GitOps Manager Adds Policy Based Cluster Automation to Kubernetes

In a previous post we introduced the combination of Flux and the Cluster API to enable cluster management with GitOps. The result is you can declaratively define your clusters and perform operations on the clusters all via Git pull request, as you would the workloads on the clusters.

But what happens if you are building a large multi-cloud and/or hybrid platform? In this post we discuss this type of scenario and also introduce an advanced deployment pattern for managing multiple clusters.

Multi-cloud / Hybrid Scenarios with GitOps and Cluster API

Multi-cloud/hybrid are exactly the types of scenarios that the Cluster API was designed to excel at. Infrastructure providers, such as the AWS provider (CAPA) or vSphere provider (CAPV), provision infrastructure in a target environment. Depending on which provider is used, the infrastructure can be in a public or private cloud and thanks to providers like CAPV, they can be provisioned on-premise as well (although CAPV isn’t limited to on-premise).

There are a few things to note in a hybrid or multi-cloud scenario:

  1. The Management Cluster (a.k.a control plane cluster) must:
    1. Hold credentials for each of the target environments for use by Cluster API infrastructure providers.
    2. Have sufficient network connectivity for the Cluster API providers to be able to provision their infrastructure and the requirements are different for each provider.
  2. Every Workload Cluster (a.k.a. tenant cluster) is managed by the management cluster and the CAPI resources that it contains.

For example, if we are targeting on-premise vSphere, AWS (multiple accounts) and Azure:

GitOps and CAPI - Multi-cloud (no MoM).png

There is nothing inherently wrong with the above approach. It is the simplest way to achieve multi-cloud and/or hybrid type scenarios with Cluster API and Flux.

A note about security

Since the management cluster holds the credentials for a number of target environments, it does represent an area of potential risk. If the management cluster is compromised then access to all the tenant clusters is possible. Additionally, every cluster is managed through the same management cluster, and depending on your business, you might see this as a single point of failure or an insufficient separation of concerns with regard to the cluster ownership. For example, perhaps each engineering team has their own AWS account.

How is do you implement multi-cluster management clusters?

An advanced implementation pattern with the multi-cluster manager in WKP that targets multi-cloud (public and/or private) and hybrid scenarios provides the following functionality:

  • Tenant clusters for a target environment that are only managed from within that target environment.
  • Ability to deploy environment/zone/account level services. For instance, an aggregated prometheus that collects metrics from every tenant cluster in a particular environment.
  • Greater segregation between cluster definitions for target environments.
  • Target environments that can run detached.

With the multi-cluster manager there is a "master management cluster" in each target environment which looks after the tenant clusters for that target environment. The master management clusters are themselves provisioned by another management cluster:

GitOps and CAPI - Multi-cloud (with MoM)-orange.png

  1. The multi-cluster management cluster is responsible for provisioning the master management clusters in each of the target environments. It needs the Cluster API and providers installed for the target environments beforehand. The multi-cluster management cluster also requires credentials and network connectivity for the Cluster API providers to do their job.
    An option is to make this cluster ephemeral (perhaps using KIND or k3s) and only create it when a new target environment is needed. In this scenario, the cluster would be spun up for a short period of time and the new master management cluster provisioned and then the cluster would be torn down
  2. The “masters repo’ holds the cluster definitions for the master management clusters. This could be the raw YAML files or you could also use a Helm Release. A Flux instance running in the multi-cluster management cluster will automatically apply either the manifest or the Helm chart.
  3. Credentials are required for each target environment. These credentials can either be permanent or you can use short-lived credentials. To use short-lived credentials, secrets can be injected into the cluster from Vault (using secrets engines) for use by the Cluster API providers.
  4. Each target environment contains a “master management cluster”. It's the master management cluster’s responsibility to provision any tenant clusters within that environment. To do this it needs only the Cluster API and the provider for that environment (i.e. capz, capv). There will also be at least 1 instance of Flux deployed to monitor the “Cloud/DC Master Repo”
  5. The “Cloud/DC Master Repo” holds the following:
    1. Cluster definitions for the tenant clusters for that target environment.
    2. Application definitions for services that should run centrally in that target environment. For example, you might want to run log or metric aggregation on the management cluster for all tenants within that environment.
  6. The provider for that environment would need credentials for that environment only. You should be able to take advantage of things like IAM Roles for EC2 Instances (iamInstanceProfile property in CAPA) so that explicit credentials aren’t needed.

Note: The concept of an ephemeral management cluster is new. It assumes that the Cluster API providers are idempotent and can resume from where they left off. This has been tested with the AWS provider (CAPA) but we haven’t tested all the providers yet.

Conclusion

In this post we introduced the multi-cluster management pattern for Flux and Cluster API to aid various multi-cloud and hybrid scenarios when using GitOps for cluster management. The pattern demonstrates additional architectures that you can take advantage of when adopting GitOps for cluster management. One size never fits all and the above patterns can have many variations to meet your specific requirements.

Just starting out with Kubernetes?

Excited about the opportunity of cloud native and GitOps but not sure how to navigate your organization to the path to success? Whether you’re using Kubernetes in the cloud or behind the firewall your team needs battle-tested architectures, workflows and skills to operate Kubernetes reliably. Our QuickStart program is a comprehensive package to design, build and operate Kubernetes in production using the GitOps methodology. Contact us for information.


Related posts

Manage Thousands of Clusters with GitOps and the Cluster API

WKSctl - A New OSS Kubernetes Manager using GitOps

Weave GitOps Manager Adds Policy Based Cluster Automation to Kubernetes

Learn what's needed to run Kubernetes in production from these convenient checklists: one for your application and another for your cluster.