Modeling Kubernetes at Enterprise Scale
One of the biggest challenges for enterprises transitioning to cloud native development is delivering Kubernetes platforms and environments where ever one is needed. Read Steve George’s post on how a model-based approach helps solve this.

One of the biggest struggles for a Cloud Native approach is that multiple teams need to operate across both the public cloud and on-premise:
- Plurality of cloud platforms: In financial services, regulations now require teams to be able to migrate from one public cloud to another. For many users the unique capabilities of each public cloud mean that it makes business sense to operate some workloads in one cloud environment over another.
- On-premise is part of the mix: For most enterprises the reality is that whether for business or technical reasons, on-premise IT is part of the mixture now and will be in the future. IDC found that 55% of containers run on-premises.
IT teams have a variety of existing technology investments, from bare-metal to virtualization in the cloud - how do we utilize them? As we scale out environments, different teams and application requirements vary dramatically - how do we deal with this? Containers and Kubernetes promises a standard deployment and operations layer, but how do we handle these sources of complexity?
In this post, I'm going to cover how Weaveworks uses a declarative model-based system for deploying clusters and then I’ll show how our team is building this into the Weave Kubernetes Platform to solve these kinds of problems.
Running Kubernetes everywhere
If Kubernetes is to be the new base layer it has to run everywhere. That's a difficult requirement given that it's quite young. A usable platform needs capabilities that aren't universal across different backends: a good example is ingress, every cluster needs it, but if you install in the cloud or on-premise you'll use a different ingress set-up.
With Weave Kubernetes Platform (WKP), we're addressing those points by providing the capability to deploy and operate clusters on multiple compute substrates. You can use the installer to create clusters on metal, virtualized instances and managed Kubernetes (e.g. EKS). It's a completely GitOps-enabled capability where you define a 'Model' for the cluster and the system can translate those definitions between the specific implementation for each backend.
The core technology comes from the Cluster Management Kubernetes SIG, which is worth learning about. This provides a standard API for the installer system that enables us to delegate the actions of installing to a lower level. With WKP we have a higher level model that the lower layer can translate into installation actions.
How cluster models keep environments consistent
For a Kubernetes enterprise deployment, we need a scalable approach if we're going to run either a pipeline of clusters (e.g. Dev, Staging, Prod) or a large number of clusters (e.g. multiple teams or departments). Manual operations reduces consistency and repeatability.
Everything we do with WKP is focused on a declarative approach that is Kubernetes-centric.
We begin with a standard cluster definition that can build clusters on multiple backends. We use GitOps to do this with a definition file that declares all the capabilities the cluster must have. If the user installs the cluster on AWS, then the system translates these requirements into EKS capabilities; and if the user installs the cluster on-premise, it will install a different set of components.
That's the definition of items below the Kubernetes API. But operators commonly want every cluster to be installed with some standard components: for example they might want a specific authorization set-up, or security tools on every cluster. At Weaveworks, we call these 'cluster components' and in our product, WKP we define a standard curated set of them that we supply.
GitOps-enabled automatic cluster management
Each cluster component is declared within the cluster definition file and is automatically installed, no matter which backend is used. Users can also define their own cluster components, anything that can be installed by Kubernetes, so that a particular organization or team can have their own 'standard cluster' definitions.
The automation part comes when an agent running in the cluster checks the configuration files regularly, and whenever there's a change, updates the cluster accordingly. We use the Kubernetes reconciliation loop to bring the cluster into alignment with the declarative configuration. This is otherwise known as GitOps.
We can imagine now that: every cluster we build will use this 'standard cluster' definition - no matter where it runs.
Centralized cluster management or team based?
Let's say that you're a complex organization with a central team that provides IT and also many separate teams that run their own clusters. Practically, organizations either have the central team (the 'platform team') run and manage all the clusters, or they delegate that responsibility out to each separate team (application teams or business unit teams).
The best option we've found is for individual teams to have the ability to create and manage clusters, but with a level of oversight from the central platform team. Since we have a 'standard cluster' definition, each cluster has consistency. In WKP we use a rule-based system to stop users from accidentally (or on purpose) removing standard cluster components or from changing them if they don't have permission to do so.
We can imagine now that: every cluster, no matter who builds it, will use this 'standard cluster' definition - wherever it runs.
Application and team workspaces
The last element to address is the applications (and any dependent services) that need to be run on the cluster. In the simplest form, an application consists of the containers and their configuration installed into a namespace. If there are multiple teams using a single cluster then you may want to install each team’s applications into their own namespace so that multi-tenancy is enforced.
In WKP, we have 'team workspaces' which is a set of users that are mapped to a namespace with it's own GitOps definition. This enables the team to easily add any application or service by using the same declarative configuration approach. Anything defined in the team workspace is automatically installed into the namespace on the cluster. The impact is that development teams have delegated multi-tenancy while operations teams have control over the platform's capabilities.
We can imagine now that: every cluster, no matter who builds it and wherever it runs, will use a complete 'standard cluster' definition.
Model-based configuration management
So far we've shown an approach where every element uses declarative configuration: the cluster, each component within it and the applications that are running on it. Using GitOps this definition is stored in Git and an agent running in the cluster ensures that we're always up to date: every time a change is made to the definition file, the cluster updates itself.
In complex environments the needs start to expand beyond a single 'standard cluster' definition. We might have a need for clusters to be sized differently, or use different server definitions (imagine several machine learning clusters versus a web app cluster). We might have a need for different applications to run on a subset of clusters.
With WKP, declarative configuration is extended with the idea of 'models' to deal with variants. We can define a model to consist of a cluster definition, a set of components and team workspaces. Users then deploy a model and specialize the configuration at build time to deal with their particular needs. Commonly, the components have the ability to use over-rides so that configuration data is provided at runtime. But, if that's not enough then the user can build their own components and include these in their models.
Manage cluster fleets with model based configuration
With models, WKP adds the ability to handle fleets of clusters that are specialized for a particular category or situation. It means we can keep standardization on the most common elements but allow variance across an estate: it also means that clusters can access the unique capabilities of each environment while still following a model.
“We can now imagine that: every cluster, whoever builds it and wherever it runs, will use a complete GitOps model to consistently define, deploy and operate the entire platform.” --Steve George, COO Weaveworks
Benefits of model-based configuration
At the beginning of this post, I said Kubernetes has to be everywhere and it also needs to access the unique capabilities of each environment. With models-based configuration management and GitOps automation we have those capabilities. We can model a variety of requirements while also ensuring that the system is highly consistent and automated. With this system, it's possible for teams to operate 20, 200 or 2000 clusters. Providing a consistent operating model that reduces the cost for operators and increases the capabilities for application operators.
Contact me If you’d like more information or if you would like to see a WKP demo.