Liquid Metal: multi-cluster Kubernetes on bare metal with microVMs
Das Schiff platform from Deutsche Telekom is an example of the benefits of a GitOps-driven, fleet-managed platform for telecom operators. A key enabler for rolling out Das Schiff to the edge is the ability to run Kubernetes on microVMs. This blog describes the core technology, jointly developed by Deutsche Telekom, Weaveworks and AWS, which significantly improves the efficiency of deployment of Kubernetes across bare-metal pools.
Recap: Weaveworks at KubeCon and GitOpsCon 2022
What is the Enterprise Market Perspective of GitOps?
Getting Started With Weave GitOps
This post was authored by Richard Case, Tech Lead on Bare Metal Kubernetes, Weaveworks, and Vuk Gojnic, Squad Lead for Cloud Native / Kubernetes Platform at Deutsche Telekom Technik
Increasingly, Kubernetes is being used at the resource constrained edge and far edge: examples of this are 5G Core Remote User Plane Function (UPF) and O-RAN Virtualized Base Band Unit (vBBU.) As telecom operators such as Deutsche Telekom roll out their 5G networks, they are going to deploy 1000s of Kubernetes edge clusters running different 5G and traditional workloads in a cloud native environment. An example of this is Deutsche Telekom’s Das Schiff platform.
Traditionally, virtualizing multiple Kubernetes clusters has relied on heavyweight and expensive virtualization approaches. Instead, Weaveworks has demonstrated an approach based on KVM and Firecracker (known as microVMs), which provide lightweight virtualization capabilities and significant benefits for running multiple virtualized Kubernetes clusters in many environments, including edge.
In his keynote at KubeCon Europe earlier today, Vuk Gojnic described the Das Schiff platform and the benefits of a GitOps-driven, fleet-managed platform for telecom operators. Weaveworks is a leading partner in this initiative. A key enabler for rolling out Das Schiff to the edge is the ability to run Kubernetes on microVMs. This blog describes the core technology, jointly developed by Deutsche Telekom, Weaveworks and AWS, which significantly improves the efficiency of deployment of Kubernetes across bare metal pools. This approach is called Liquid Metal in the Das Schiff platform.
Today we can successfully run multiple microVM based Kubernetes clusters, with both virtualized and direct nodes. This demonstrates that AWS Firecracker is a game-changing approach to creating lighter-weight virtualization for Kubernetes clusters at scale.
Demo: creation of 3 EKS-D clusters where control plane and worker nodes are Firecracker microVM based and are split across 2 physical hosts.
The diagram below demonstrates how a simple deployment can create two virtual clusters across two bare metal servers. This diagram shows a simple deployment of two separate Kubernetes clusters running side-by-side in microVMs.
Because Firecracker / KVM utilizes hardware virtualization capabilities, security and isolation between virtualized clusters have notably improved, and so has the control of resources. This use case opens the possibility of creating multiple fully standalone CNCF conformant clusters even on very limited hardware footprint.
A key challenge of telco and edge workloads is that applications need access to specific hardware (e.g. radio or network adapters and hardware accelerators) on specific physical machines, and often this is complex to virtualize and cumbersome to use in a virtualized environment. The ability to run Kubernetes control plane nodes in virtualized models together with worker nodes on non-virtualized bare metal is an approach we call “mixed mode” clusters. The diagram below shows a simple “mixed mode” deployment with two control-plane nodes and a physical-host worker node. Beside being fit for the edge scenario, mixed mode enables significantly greater efficiency in managing bare metal Kubernetes clusters, by moving control plane nodes from dedicated bare metal servers into microVMs, thus significantly reducing the overall number of nodes required in a bare metal pool.
As part of the demonstration, we have shown that we can deploy both, mixed mode clusters and fully virtualized clusters together in the same bare metal environment. This has significant benefits for future 5G applications where “network function” (5G workload) can be run alongside signaling functions, management functions, web and customer applications in the same hardware with strong isolation and resource control.
While we are initially focused on edge scenarios and telco workloads, we routinely hear from cloud providers and on-premise users that the ability to virtualize multiple Kubernetes clusters on the same hardware is going to revolutionize the management of Kubernetes in traditional data centers, and not only for efficiency. One of the additional reasons being that virtualized clusters can be spun up and down quickly with low overhead, which is key to testing strategies as well as progressive deployment.
This demonstration also utilizes the open source project Amazon EKS Distro (EKS-D), which enables the same version of Kubernetes that AWS runs in its cloud to run anywhere. As a result, users and telecom operators can have a uniform Kubernetes experience in the AWS cloud, on-premise and in an edge deployment.
Let's summarize the key benefits of this approach:
- Mixed mode: enabling clusters with both virtual and physical control planes and worker nodes - simplifying access to specialized hardware such as SmartNICs, GPU, crypto-accelerators etc.
- ARM support: the Weaveworks and AWS Firecracker approach fully supports ARM, enabling running multiple bare metal clusters on ARM hardware; which is increasingly important for both edge and traditional data centers.
- Security: while Kubernetes has some support for multi-tenancy, the ability to create lightweight clusters inside KVM and enable hardware-based virtualization controls to protect clusters from other workloads in the same physical environment is a key benefit. We are currently researching how this model can be extended to take advantage of enclave support such as Intel SGX and ARM Trustzone.
- Fast startup time: AWS Firecracker virtual machines take less than 200ms to bootstrap, and empowers us to spin up Kubernetes clusters with massively reduced startup time. This is vital for effective conformance testing strategies.
- Enabling a fully declarative approach with GitOps: traditional provisioning of virtual machines is an imperative process that is complex and error-prone. At Weaveworks, we strongly believe that declarative approaches together with version control and reconciliation (GitOps) builds robust, repeatable infrastructure.
This approach is part of Weaveworks’ ongoing support for “bare metal” Kubernetes. The core capabilities including installation of EKS-D onto existing infrastructure are available in Weave Kubernetes Platform. Contact email@example.com if you are interested in seeing a demo of this capability.