The success of Kubernetes is due in no small part to its flexibility and power as a container orchestration system. It can scale virtually indefinitely, which has enabled it to provide the backbone for many of the world’s most popular online services. And as a proven open source solution with a rich ecosystem, it is accessible and easy to set up, whether for personal learning, development or testing.

If you want to learn more about why companies are adopting Kubernetes, we have the whys covered here.

When it comes to using Kubernetes in production, however, things get a little more complex. There are numerous issues you need to consider, covering the full range of critical success factors for an online application, including stability, security and the way in which you will manage and monitor your application, once it is up and running. Getting any of these wrong could prove costly – which is why we’ve written what we consider the definitive introduction to Kubernetes in Production. Please note, however, that this is an introduction only. Kubernetes is a complicated subject – consider this a first step on the road to production.

If you’re already preparing to go live, check out our production-ready checklists.

How is Kubernetes used in production?

Kubernetes is designed for the deployment, scaling and management of containerized applications. It schedules the containers themselves as well as managing the workloads that run on them. This capability to manage both the applications and their underlying infrastructure allows for software development and the management of the platform to be integrated – which in turn improves the manageability and application portability. With the right architecture and processes in place, it is possible to make frequent changes to a high-availability application without taking it offline. You can even move an application from one cluster to another. This is the reality of using Kubernetes in production, however to realize its potential in this way, you need to configure it correctly from the outset.

Kubernetes in production: key considerations

Production-ready is a loaded term, in that what constitutes production readiness will vary according to your use case. While you could argue that a Kubernetes cluster is production-ready the moment it is ready to serve traffic, there is a commonly agreed set of minimum requirements. We’ll explore these below.

1. Security

Naturally, you need to prioritize security, ensuring attack surfaces are kept to a minimum. That means securing everything – including your apps and the data they use – is locked down. Kubernetes itself is a rapidly evolving product, with updates and being issued frequently. Make sure you keep on top of them.

2. Portability and scalability

These two can be grouped together as they ultimately come down to the same thing: maximizing cluster performance. The aim is to ensure that you continue to benefit from Kubernetes’ ability to self-heal nodes, autoscale any infrastructure and adapt to your expanding business, without taking a performance hit.

3. Deployment velocity

Kubenetes was designed to make management of the infrastructure and applications easier. Its automated features, coupled with an appropriate operating model such as GitOps, enables you to increase deployment velocity through automated CI/CD pipelines.

4. Disaster recovery

Using Kubernetes in conjunction with GitOps can help enormously with disaster recovery, bringing MTTR down from hours to minutes. Even in the case of complete cluster meltdown, GitOps enables you to recreate your container infrastructure quickly and easily.

5. Observability

Observability is about gaining a high-level view into your running services, so you can make informed decisions before you deploy, mitigating risk. By adopting GitOps processes to monitor your Kubernetes deployments, you can monitor your applications continuously and act immediately in the case of a problem.

Kubernetes in production: best practice

In production, best practice largely concerns the way you will operate Kubernetes at scale. This covers a number of issues surrounding the management of a large application, potentially distributed across time zones and clouds, with multiple development teams working on it simultaneously.

1. Team application development

Namespaces

Namespaces allow different services to run alongside one another on a single node. Subdividing the cluster in this way means multiple teams can work on it simultaneously. Easy to create and delete, namespaces can reduce server costs while increasing quality, by providing a convenient environment for testing prior to deployment.

Role Based Access Control (RBAC)

For an extra layer of security, RBAC gives administrators control over who can see what is operational in a cluster. It allows for the creation of role groups (e.g. ‘developer’ or ‘admin’) to which group permissions can be assigned. Administrators can then specify which roles can access which clusters, including details such as whether they have read or write access to them. Roles can also be applied to an entire namespace, so you can specify who can create, read or write to Kubernetes resources within it.

Network policies

Remember that a namespace is only a virtual separation. Applications can still easily communicate with each other. To restrict which services (or even which namespaces) can communicate with one another, you should implement a network policy.

2. Advanced protection in production

Pod-to-pod SSL traffic

SSL is not native in Kubernetes, however it can be implemented via a secure service mesh like Istio.

Access to the cluster API

TLS should be implemented for all API traffic, alongside API authentication and authorization for all calls.

Access to the kubelet

The Kubelet provides https endpoints that grant control over pods and nodes. All production clusters should therefore enable authentication and authorization over the kubelet.

User and workload capabilities

Limit resource usage by setting quotas and limits, as well as specifying container privileges and restricting network access, API access and node access to pods.

Cluster component protection

Finally, restrict access to etcd, enable full audit logging, start a key rotating schedule, encrypt secrets and set up standards and review for third-party applications.

3. Data storage and Kubernetes

Data storage for Kubernetes is a complex subject. The key issue to consider is whether your application is stateful or stateless. This is because when a container is restarted, the data inside it is lost. If your application is stateless, this is not a problem, but for applications that need a persistent data store (e.g. MongoDB or MySQL), you will need to use another Kubernetes feature: volumes.

This is a topic that requires a significant amount of planning, depending on your application and use case. While there is not the space here to give this subject the attention it deserves, key subjects to research include:

· Kubernetes volumes

· Persistent volumes

· Stateful Sets in Kubernetes

· Database schemas in Kubernetes

· Rollout and update strategy

· Readiness and liveness probes

· Database migrations libraries

If you'd like to learn more about cloud native storage solutions, download our latest performance guide that walks you through a comprehensive analysis of today’s most prominent solutions.

4. Defining CI/CD pipelines for Kubernetes

Cloud native software like Kubernetes has made continuous delivery a reality for many organizations. But to achieve it, you need solid engineering practices and well organized CI/CD pipelines. They can help you increase code quality as well as the speed with which you can deliver new features. These pipelines, when organized correctly using GitOps, also solve the problem of having to give your entire development team kubectl access on your cluster – something you should generally try to avoid.

Challenges when Increasing velocity

The most common problem faced when speeding up software development is the increasing risk of failure. To mitigate this risk, you could add some manual steps (gates) prior to deployment. But by doing so, it creates a risk of divergence between your development and production environments.

Even Kubernetes has its limitations

Kubernetes delivers services more reliably than other systems, thanks to its ability to self-heal and auto-scale. Nevertheless, it still lacks some tooling. For example, when you are running a large distributed system, external dependencies such as RDS databases or other services from public cloud providers can be problematic to monitor.

5. Increasing velocity and reliability with GitOps

Because almost everything in Kubernetes can be configured declaratively, you can specify exactly what a workload is and how it’s going to run in the cluster. But you can go further than that. By storing all those declarations in a version control system such as Git and introducing software agents to monitor the running system for any divergence, you can centralize operations, enabling you to boost velocity and reliability at the same time.

These are the principles behind GitOps, an operating model designed for continuous delivery of Kubernetes applications. Because it works with Git, it means your developers won’t need to learn new tools. They can create, test and deploy new features themselves, without fear of breaking anything. Best of all, every action – whether a code update or a change to the cluster config – is recorded in Git. In effect, it gives you a self-generating audit trail – useful for any business and vital in many regulated industries.

The easiest way to get production ready

Kubernetes is complex and it becomes more complex still when you prepare your application for production. Yet with the right tools and processes, you can harness its power and build on it, delivering a secure, reliable production application for which you can deliver new features quickly and without disruption.

GitOps was designed to provide a complete toolset and operational model to do just that. And because it records everything, it can help enormously when it comes to compliance – another issue that rarely rears its head until you are ready for production.

At Weaveworks, we designed and used much of what is now known as GitOps, before donating the principles to the Cloud Native Computing Foundation (CNCF). Since then, the model has been adopted by many vendors and cloud native organizations, proving its effectiveness in production at enterprise scale.

To learn more about how GitOps could help you prepare for production, download our production-ready checklists or our latest whitepaper for a deeper dive. If you are looking for a complete platform with enterprise support, ask for a demo of Weave GitOps Enterprise.