As mentioned in a prior post, Kubernetes is at its best when managing single process containers. Nevertheless, Kubernetes aims to allow new and legacy workloads to take advantage of multiple cooperating processes with pods. Containers can be single process, which is where pods come in. Kubernetes is not only designed for single process cloud native applications, and in order to allow legacy applications to be ported, the authors of Kubernetes created a logical unit that maps very closely to traditional cloud deployments of multi-process applications.
A pod is a set of containers that are deployed on the same node, get a shared IP, inside the same kernel security group, and can use traditional interprocess communication techniques. Containers within a pod share the same lifecycle, and are managed by Kubernetes as a unit.
There is a lot to cover when it comes to pod design, and rather than try to cover everything in a single post, we’ll cover the basics, and then provide links to more detailed discussions where possible with follow up posts on subjects that we have not yet found quality detailed resources.
In this post, we’ll cover pod scheduling and how to maintain quality of service by properly specifying resource limitations.
Pod Scheduling: Priority and Quality of Service
Kubernetes uses several pieces of information in scheduling and evicting workloads on the cluster. Failure to set these parameters properly can result in performance problems, workload downtimes, and significant overall cluster health issues.
Resource Requests and Limits are the most obvious of these settings, but there are subtle implications to these settings which are not always understood up front.
All four of the standard service types (Deployments, StatefuSets, Jobs, and DaemonSets) accept a
request and a
limit for thee different types of compute resources:
ephemeral-storage. Additionally Kubernetes allows for a pod priority field to be set.
Lifecycle of a Pod
At a very high level, the scheduler controller maintains a queue of pods to be deployed for the cluster and then for each workload in the queue looks for a node with enough available compute resources to fulfill the `request` for that pod and assigns the pod to that node. Limits are ignored during scheduling.
Once a pod is scheduled to a node, the kublet on that node picks up the change, and installs and starts the pod.
In Kubernetes versions < 1.8 pod priority is ignored by the scheduler, in 1.11 the above story is modified so that pods are scheduled in priority order. In 1.8-1.10 this feature was in alpha and had to be explicitly enabled in the Kubernetes config.
Resource Limitations and Pod Priority
However, this story is complicated by the fact that since pods are allowed to have limits higher than their request, resource starvation is possible.
When a node hits its limit for memory, or disk pods can be evicted and sent back to the scheduler for redeployment on another node. CPU on the other hand will be throttled back by the linux kernel when limits are met.
Quality of Service classes and what they mean
In order to manage this process, Kubernetes defines three different Quality of Service Classes which allow you to control which pods will be evicted in case of resource starvation.
- A pod is Guaranteed IF every container in that pod has:
a. explicit requests and limits for CPU and memory
b. Limits that exactly match the memory/cpu requests for that container.
- A pod is Burstable IF every container in that pod
a. explicit requests and limits for CPU and memory
b. Limits that are greater than the requested amount for at least one container in the pod
- A pod is Best Effort IF any container in that pod does not have an explicit memory/cpu request.
Pods are evicted only if they are using more resources than the user defined request. This means guaranteed pods will never be evicted, burstable pods will only be evicted if they are using more of the starved resource than allowed, and best effort pods can be preempted at any time.
Best practices for resource requests and limitations
There are a few important implications of this:
- Setting matching requests/limits on all values in a deployment spec will give you a guaranteed status, and prevent eviction.
- Since DaemonSets are always scheduled to be redeployed to the same node, it is generally recommended to give them a guaranteed status to prevent thrashing on that node as a pod is continually rescheduled to the same node.
- Applications that can behave behave badly when terminated should also get guaranteed status wherever feasible.
In this post we discussed how defining requests and limits are important for your deployments. We followed up with an explanation of the different Quality of Service classes that also need configuration when running production workloads.
Start from the beginning in this series:
Production Grade Kubernetes Support
Operate Kubernetes with Weaveworks’ experts. We’ll help you build and operate Kubernetes, providing advice and support. Our support covers critical workflows and technologies so you can build, run and scale with confidence.