Kubernetes Storage FAQ: How to Configure Storage for a Bare Metal Kubernetes Cluster
Managing your application state can be tricky! Learn how to easily manage storage on bare metal Kubernetes installations
In this post we explain some common issues you may run into with Kubernetes storage - including why managing your application state can be tricky and how by using persistent volume claims, you can easily manage storage on bare metal Kubernetes storage installations.
What is Kubernetes Storage?
Kubernetes is an open-source platform that allows for orchestration of container and containerized applications - from creation to deployment to iterations to operations. Because of this, Kubernetes storage is able to store large amounts of data in easily accessible containers.
Kubernetes cluster information is regularly stored in volumes, which provide persistent storage for information that the application needs to operate. Persistent volume claims request and bind specific information packets to the volume, which acts as the storage unit for a Kubernetes cluster. Volumes can be created and used by both on-disk and cloud storage providers built with Kubernetes principles in mind.
Kubernetes also features resources used to manage volumes and other storage resources. Storage classes can be used to maintain persistent storage volumes, while volume plugins can be used to expand Kubernetes storage infrastructure using various service providers.
FAQ: How to Configure Storage for Bare Metal Installations
In our ongoing series on the most frequently asked questions from the Kubernetes community meetings, we are going to look at Kubernetes storage - specifically how to configure storage for bare metal installations. Much like the problems with defining ingress and routing traffic for bare metal, you obviously can’t rely on the convenient services that are available from the major cloud providers to provide persistent storage volumes for your stateful applications.
On the other hand, you don’t want to fall into the trap of having to look after your persistent volumes like pets. But let’s back up a little bit and explain why state can be a problem with Kubernetes and why you even need to consider storage management for your application.
Why is state so tricky?
Everyone working with Kubernetes storage knows that your containers should be stateless and immutable. But in reality we all know that there is really no such thing as a stateless architecture. If you want to do something useful with your applications, data needs to be stored somewhere, and be accessible by some services.
This means you need a storage solution that makes that data available after the Pod recovers. The basic idea behind storage management is to move the data outside of the Pod so that it can exist independently.
In Kubernetes storage, data is kept in a volume that allows the state of a service to persist across multiple pods. Refer to the Kubernetes
documentation on Volumes where it is explained that disk files within a container are ephemeral unless they are abstracted through a volume.
Kubernetes exposes multiple kinds of volumes. The most basic of which is the empty volume `emptydir`. With this type of volume, your node stores its data to an ‘emptydir` that runs from either RAM or from persistent storage like an SSD drive. This type of Kubernetes storage obviously runs right on the node and means that it only persists if the node is running. If the node goes down, the contents of the emptydir are erased.
The YAML for this type of definition (and any other volume definition for that matter) looks as follows:
- image: k8s.gcr.io/test-webserver
- mountPath: /cache
- name: cache-volume
(from Kubernetes Docs: Volumes)
If you don’t want your directory to start out empty, you can use a hostPath instead. The hostPath is connected to the node in essentially the same way through a YAML file where data is also stored to RAM or the SSD drive. The difference is that the host path is mounted directly on the Pod. This means if the Pod goes down, its data will still be preserved.
Public Cloud Volumes
If you are using one of the public clouds, you can take advantage of the many services such as awsElasticBlockStore or GCEPersistentDisk (or something similar) as your Kubernetes storage volumes. And most running Kubernetes in the public cloud would be doing it in this way. With most of these cloud volume services, all that is necessary is a YAML definition file that tells the Pod which provider and service to connect.
The problem with connecting directly to the volume in this way is that developers must know the specific Volume ID and the NFS type before they can connect to it. This is a lot of low level detail that developers must keep track of and with a large development team this can create a bit of a management mess, not to mention a possible security breach. This is where a Persistent Volume Claim (or PVC) comes in, which provides an abstraction layer on top of those details. But before we get to that, let’s first have a look at how you can use an NFS mount for your data.
NFS or the network file system is a UNIX protocol that allows you to mount any file system. The file system can be defined in a YAML file and then connected to and mounted as your volume.
If a Pod goes down or is removed, an NFS volume is simply unmounted, but the data will still be available and unlike an emptydir it is not erased. However if you take a look at the NFS example in the documentation, it says you need to create a Persistent Volume Claim first and to not directly mount a volume with NFS.
Persistent Volume Claims
With a Persistent Volume Claim, the Pod can connect to volumes where ever they are through a series of abstractions. The abstractions can provide access to underlying cloud provided back-end storage volumes, or in the case of Kubernetes bare metal, on-prem Kubernetes storage volumes.
An advantage of doing it this way is that an Administrator can define the abstraction layer. This allows developers to obtain the volume ID and the NFS type through an API without actually having any of those details. This additional abstraction layer on top of the physical storage is a convenient way to separate Ops from Dev. Developers can instead use PVC to access the storage that they need while developing their services.
These are the parts to a persistent volume claim:
Persistent Volume Claim - a request for storage and mount it to a Pod dynamically without having to know the backend provider of the volume.
Persistent Volume - the specific volume being called as outlined in the claim as provisioned by an Administrator. These are not tied to a particular Pod and are managed by Kubernetes.
Storage class - allows dynamic storage allocation which is the preferred ‘self serve’ method for developers. Classes are defined by administrators.
Physical storage - the actual volume that is being connected to and mounted.
So how do I configure storage for bare metal Kubernetes?
Configuring local storage PVC for statefulsets went GA as of Kubernetes 1.10. Stefan Prodan (https://twitter.com/stefanprodan), Weaveworks community engineer has written this useful step by step tutorial on how to use kubeadm to spin up your cluster and then configuring it to use local SSD persistent volumes for statefulsets. The diagram below shows the basic architecture of this type of configuration on bare metal servers.
According to Stefan, using local SSD storage works well for HA capable databases like Mongo and ElasticSearch, and he further clarifies that “if your statefulsets are not HA you could also use Rook for your volumes.”
There are also many other third-party plugins that you can explore in the Kubernetes docs or take a look at this list of storage resources from mhausenblas
In this post, I went over some of the problems with stateful applications running on Kubernetes. I then provided a brief sampling of the different types of volumes that are available and how they operate. Finally I described how by using Persistent Volume Claims you can more easily manage persistent data with a large development team. And lastly, a link was provided to a tutorial written by Stefan Prodan that describes how to configure persistent volumes for bare metal servers running Kubernetes.
For any organization looking to easily deploy and manage Kubernetes clusters and applications at scale in any environment, Weave GitOps is the way to go. Weave GitOps, a state of the art GitOps platform powered by Flux, enables declarative infrastructure, complete pipeline automation, and built-in security guardrails as every change to the system is versioned and audit-ready. Try the forever free version of Weave GitOps or contact us for a demo of Weave GItOps Enterprise.