The pod is the most fundamental element of orchestration in Kubernetes. Understanding how to design and optimize pods for Kubernetes is the first step towards creating a secure, stable, and efficient service in Kubernetes. So, we thought it would be a good idea to bring together all of the best pod design and optimization advice that we can find into one place

Understanding Containers - The Kubernetes Way

The fundamental unit of a pod is the container which encapsulates the software you want to run.

What Exactly is a Container?

By now you are probably at least passingly familiar with containers as a lightweight alternative to virtual machines which do not have the overhead of running a separate kernel. And that is true, but from the perspective of Kubernetes  containers, there are a couple of additional properties that must be respected.

Before we dive into those, here’s a very brief, very high level of overview of container technology in general. In practice containers consist of two parts:

  1. A disk image holding the software you want to run
  2. An “engine” or container runtime, designed to configure the kernel to run that image in an isolated kernel namespace.

Containers have a long and illustrious history. They came to the forefront of the IT world after the first public demo of Docker containers by a little known platform as a service company called dotCloud (since renamed Docker Inc) in a lightning talk at Pycon 2013.

Docker has come to define the container landscape and eclipse many other container technologies. For good or ill, Docker images (dockerfiles) and runtime (docker engine) are now almost always what people mean when they say container.

Containers from the Kubernetes perspective - Borg, cgroups, and Immutability

Kubernetes has experimental support for other container runtimes, but Docker is the default and is used almost everywhere. To understand why Kubernetes chose Docker rather than one of the other more feature rich container technologies available in 2014, we need to dig into what containers provide to Kubernetes.

For Kubernetes, containers are much more than kernel isolation. They are a design pattern for immutable infrastructure.

To understand that, we need to look back to 2006, inside Google, at the genealogy of Kubernetes. It would be fair to characterize Kubernetes as a new open source iteration of Google’s home grown platform for orchestrating its own internal infrastructure called Borg.

Google created Borg and the kernel namespace controls (cgroups) on which it depends for container isolation many years prior to Docker’s popularization of container technology. In fact cgroups were merged into Linux, and released in 2.6.24 in January 2008, and provides all of the kernel side functionality of Docker containers.

Google has vast data centers, constantly at work crawling through every page on the Internet, indexing them, handling the world's largest hosted email service, dynamically auctioning off ad space, and serving billions of search results per day. Maintaining this infrastructure required new ways of thinking about operating servers, and their approach was built around one critical simplifying assumption: immutability of containers.

Mutability and Failure - A Sad Story

If workloads are packaged into a format that can run anywhere with the necessary hardware requirements, and can be automatically rescheduled to another machine if any given node fails, they must not change over time. Changes must be tracked, and recreated on failure.

And if you are dealing with millions of servers, machines are going to fail all the time.

One approach is to try to automate that attention. My previous experience with bringing a more traditional approach to cloud orchestration, made clear to me that this is a herculean if not sisyphean task.

Mutating the instance goes wrong all the time:

  • Installs fail, 
  • Upgrades go wrong,
  • Configuration gets corrupted
  • Etc

And the result is that writing orchestration scripts that properly handle all the known failure cases is complicated to say the least. Add to that the need to properly handle backup and restore operations, and correctly configure connections between services, handle node failures, and maintain local state through it all is extremely difficult. We wrote a framework in which to fit all these things, and left it up to the open source world to do that work. In retrospect what we did was handle the easiest part of the work, and left the hardest part up to those writing the actual orchestration scripts.

Google’s approach was different, separate data from code. Store state in a non-local persistent way, and keep individual instances in a known clean - immutable - state.

Kubernetes could have chosen several other container competing container technologies like the groundbreaking container system in Solaris released in 2005 - Solaris Zones, BSD Jails, or the competing LXC container framework for Linux - container technologies that were arguably more feature rich and mature. But, Docker containers were immutable and Google knew that immutable containers were much easier to orchestrated.

So, while these are not intrinsic properties of containers, to do Kubernetes properly requires accepting as much as possible that your containers are:

  •  Immutable - they can be deployed to any suitable node at any time, with as few potential failure points as possible, and as little opportunity for node specific failures as possible.
  • Transient - failure of any specific instance is not a critical event requiring specific attention.

Updates and upgrades are performed in a safe manner by completely replacing old versions with new images. Failures are handled not by human intervention, but by deploying the image on another node. Every container instance in the system is in a known state at all times.

Implications for Container Design

There are two major implications of all of this for container design.

Configuration must be separated from code, and state must be managed separately in some failure resilient manner.

Furthermore, there is a secondary, but important implication for performance: Containers should be as small as possible. Large containers take up significant cluster network, storage, and memory resources. Due to the transience of containers, they will be copied around multiple times, take up space on multiple nodes, and take longer to start up.

Separating Config from Code: Kubernetes Configuration Management

The first, and easiest to achieve of these is that configuration must be separated from the container image, so that it can be persisted even when containers are restarted, scaled up, or scheduled to new nodes.

Kubernetes provides three mechanisms for providing dynamic configuration information to containers in the cluster. Configmaps, secrets, and environment variables.

Environment Variables and Why Not Use Them

The 12 factor app manifesto explicit picks a technology here and requires “config stored in the environment” and Kubernetes supports this. However, there are some limitations to this practice.

  1. Environment variables are global variables
  2. Because of 1, Tokens and other “secrets” are not access controlled when stored in environment variables
  3. Some process managers like Cron and Monit (which you probably shouldn’t use in Kubernetes anyway, as we will discuss later) scrub environment variables before starting your process, leading to confusion as to why your app stops working when a process manager is used
  4. Running processes can’t effectively pick up changes to config in the environment

So, unless you are using an app that is already designed to work with config in the environment, my recommendation is to use ConfigMaps.

ConfigMaps

Config files have a number of advantages, they can be version controlled (separately from the code) they can use file system access controls, processes can re-read them to pick up configuration changes while running, and they are already the default for many projects.

Kubernetes ConfigMaps make it easy to mount volumes with your configuration files in your container. Configmaps are separate Kubernetes objects with a name, and a set of key-value pairs. Each key represents a filename, and each value is the contents of that file.

When you define a pod, you can mount any number of configmap based volumes at whatever path you like, and Kubernetes will assure that those files are published at that location and available for the code in your container to use as needed.

The details of creating configmaps, mounting them in a volume in a pod can be found here.

In part 2 of this post, we'll examine how to manage secrets in Kubernetes.