Kubernetes is a complex system to manage because of the many parts it is made up of. Yet, it is becoming imperative for IT and DevOps teams to have a good understanding of the different components of Kubernetes and how they work together. In this post, we look at the core Kubernetes components and their role in the functioning of a Kubernetes system.
Containers are the building blocks for cloud-native applications, and Kubernetes itself exists to better manage these building blocks. Though Kubernetes can run many container formats, the most common format used is Docker.
A container packages an application's code and its runtimes and libraries - everything needed to run that service or application within a single unit. The biggest advantage of a container is that it makes applications portable. These containers can be shared between teams or externally with confidence that if they work on one machine, they will work on another one as well.
The Kubernetes architecture is built to operate containers at scale. While it is easy to spin up a 'Hello world' app on a laptop in a container, it is more challenging to containerize enterprise applications. This is where Kubernetes, with its container-specific design, simplifies the entire process.
Pods are a collection of containers. These containers in a pod share common resources such as a namespace, storage, and networking. Pods are ephemeral, and they are terminated once their tasks are completed. If a node running a pod has a failure, the pod is deleted as well. This means that pods need to be managed across their lifecycle and replaced as needed.
A pod goes through multiple phases across its lifetime. These include pending, running, succeeded, failed, or unknown. Containers within a pod, similarly, have different states such as waiting, running, and terminated.
Pods are created based on their specifications in a PodTemplate file. However, pods are usualy not created individually by the admin. They are automatically created by the controller based on the specifications defined by the administrator.
Dive deeper into Kubernetes pod resource limitations and quality of service.
A node is a server instance that runs other higher-level components in Kubernetes. It makes up the infrastructure in a Kubernetes system. Each node is made up of components such as the kubelet, and kube-proxy.
The kubelet runs as an agent on every node and registers the node with the APIserver. It periodically relays information about the pods on a node.
Kube-proxy is a networking proxy and load balancer that facilitates communication between the node and APIserver, and enables communication with pods.
A Kubernetes cluster contains multiple nodes and the control plane components, which are further explained below. A cluster has one master node and multiple worker nodes. The master node handles scheduling, changes in state, and handling updates. The nodes within a cluster are routinely replaced.
An entire Kubernetes system contains multiple clusters. These clusters can be run in the cloud or on premises. When running Kubernetes in a managed K8s service like EKS or GKE the pricing is based on the number of clusters or the number of nodes being used. In this case, the amount of resources used matters. Organizations can run multiple applications within a single cluster, and optimize costs by leveraging lower cost instances where possible.
The cloud vendors also offer completely managed Kubernetes services such as Fargate that do not require you to manage the underlying nodes. This is a great option for organizations looking to simplify their Kubernetes operations.
The control plane is where a cluster is managed from. There are many components that make up the control plane such as the APIserver, etcd, controller manager, and more.
The most important part of the control plane is the APIserver which acts as the interface that enables communication between all other components within the cluster. It is also the entry point for external requests to the cluster which must go through its HTTP API. Through the APIserver you can query and change the state of other components within the Kubernetes system. You can interact with the APIserver using Kubectl commands or using REST API calls.
The Kubernetes API allows internal or third-party developers to create custom extensions to add additional functionality to the core Kubernetes platform. These extensions are called Custom Resource Definitions. CRD's provide a mechanism to expose arbitrary metadata along with Kubernetes' own defined object types. This allows users to customize how Kubernetes works without having to modify the source code itself.
etcd stands for 'eventually consistent distributed' data store. It is a key-value store where the state of every component in a cluster is stored. It is a highly fault-tolerant data store for Kubernetes, complete with replica nodes and no single point of failure. The data stored in etcd though, requires backup.
etcd can be run on the same node as the cluster, or on separate infrastructure. This provides additional data backup capability and more control over etcd. However, it also involves additional management, and added nodes to run etcd.
The controller manager is responsible for changing the state of the cluster from the current to the desired state. There are many controllers in Kubernetes such as node controller, replication controller, namespace controller, and endpoints controller.
The scheduler assigns new pods to nodes within the cluster. It does this by considering current resource availability, pod affinity and anti-affinity and a few other criteria. When a pod is attracted to a node it is called as a toleration. If it is repelled from a node it is called a taint. By being aware of taints and tolerations you can modify and customize them to better suit your workloads.
Monitoring the control plane
As the control plane plays a key role in the functioning of nodes, namespaces, and service accounts, and pretty much anything within a Kubernetes system, it is essential to monitor every part of the control plane. The control plane by default exposes Prometheus metrics. Prometheus is an open source monitoring tool and is the most widely-used monitoring tool for Kubernetes.
A key part of monitoring is to be aware of the current and desired state of every component of the system. While the control plane components like the APIserver, etcd, and controller manager expose these metrics, they need to be analyzed in a dedicated monitoring tool like Prometheus.
Storage volumes are used in Kubernetes to store data that would otherwise be lost when a container or pod fails or is killed. Though ephemeral storage volumes are an option, the more widely used alternative is persistent volumes to ensure data lives beyond the life span of a container or pod.
There are options to have external storage volumes for Kubernetes data such as AWS EBS, CephFS, and AzureDisk depending on where you run your cluster. These volumes can be provisioned statically by you, or dynamically by the system.
This is a concise overview of the core components of Kubernetes. As you can tell, there are many components to be aware of, and this article is an overview of their role and function. You can learn more about how to use each of these components and how they work with each other in the Kubernetes documentation. If you would like to learn more about what Kubernetes is and how it allows fast growing applications to scale quickly - we have an overview here.