How would you describe and set up a “production ready” Kubernetes cluster? How do you define “production ready” and “highly available” anyway? Can a cluster be created so that it’s secured from end-to-end, has no single points of failure, and is upgradable with zero control plane downtime?
At the most recent KubeCon event in Copenhagen, CNCF Cloud Ambassador and Kubernetes Maintainer Lucas Käldström presented a talk on “What does Production Ready Really Mean for a Kubernetes Cluster?”. He began his talk with a definition of what it means to be “production ready” and “highly available”, and gave an overview of what to think about when securing your cluster. This is followed by how you can make your cluster highly available. He finishes up by explaining how it will be possible to set-up Kubernetes using the Cluster API in a declarative way.
What is Production Ready?
This is a buzzword that gets thrown around a lot, and often has a lot of meanings. According to Lucas, “A cluster is production ready when it’s in good enough shape to serve real-world traffic.”
In other words, it depends on your use case and it’s about making tradeoffs. If you are hosting a blog running in a droplet on Digital Ocean one master may be enough to be highly available. But if you are a bank with thousands of nodes and workloads, you may need two or more masters and you may also need to think more carefully about security in order to be highly available and production ready.
This quote from Carter Morgan at Google is another way to sum up production ready: “Your offering is production ready when it exceeds customer expectations in a way that allows for business growth.”
Although a cluster can be production ready when it’s good enough to serve traffic, there are some technical considerations before declaring your cluster production ready:
- That the security of the cluster is taken care of.
- That the cluster is highly available enough to meet the needs of your users.
- All elements in your cluster are declaratively controlled and managed as opposed to imperatively controlled via SSH.
- Changes to the cluster state can be safely applied.
- That the cluster passes as many conformance tests as possible.
Securing a Kubernetes Cluster
Lucas offered 5 main suggestions for securing your Kubernetes clusters:
- Use TLS-secured communication everywhere
- Identities/certificates should be rotatable.
- Have a separate CA for etcd to protect your data.
- Use the certificates signing requests (CSR) API with an external signer if possible.
- Implement API authentication and authorization
- Disable localhost:8080 and anonymous logins
- Enforce RBAC and Node authorizers
- Lock down kubelets in the cluster.
- Provide a unique identity for each kubelet.
- Disable the read-only port 10255 & the public cAdvisor port 4194
- Be careful with Kubernetes Dashboard or when using Helm.
- Don’t give them cluster-admin privileges.
- Specify the exact operations tiller can perform in RBAC.
- The best practise for security is to deny all by default
Controlling Custers through Declarative API with GitOps Workflows
Much like apps can be controlled and updated completely through YAML manifests, it is one of the goals of Kubernetes to have a cluster API definition that is declarative and use Git as your single source of truth to maintain your cluster definitions. This is convenient if you need to recreate your cluster in the case of disaster and you are using installer tools such as kops. By versioning your cluster definitions, kops or another installer can pull those manifests and use the most current up to date definitions that have been versioned in Git to reinstall your cluster in its most up to date state.
Controllers which are a part of Kubernetes can perform ‘rolling updates’. This allows entire clusters to be updated in real-time without any downtime like the way applications that run on Kubernetes are updated today. By applying the Operator pattern, and GitOps best practices, Kubernetes controllers can reconcile the state of the actual cluster definition with the desired state that is stored in Git. If there is a difference, the cluster can also be automatically updated with what’s been stored in Git.
Read more about GitOps, in our blog series.
Lucas discussed what a production ready cluster is and how this depends largely on your goals and budget. He then dove into some of the technical aspects of getting your cluster production ready by looking at security and in particular TLS certificates, and what high availability actually means. Lucas then wrapped up by providing an overview of a GitOps approach to cluster management that is currently being discussed in the Cluster Lifecycle SIG. You can watch his full talk below or here.
Weaveworks now offers Production Grade Kubernetes Support for enterprises. For the past 3 years, Kubernetes has been powering Weave Cloud, our operations as a service offering, so we couldn’t be more excited to share our knowledge and help teams embrace the benefits of cloud native tooling. Kubernetes enables working as a high velocity team, which means you can accelerate feature development and reduce operational complexity. But, the gap between theory and practise can be very wide - which is why we've focused on creating GitOps workflows, building from our own experiences of running Kubernetes in production. Our approach uses developer-centric tooling (e.g. git) and a tested approach to help you install, set-up, operate and upgrade Kubernetes. Contact us for more details.