Last week we did a run down of what you need for a production ready cluster. In this part 2 we’ll outline a checklist of best practices for applications running in Kubernetes.
If any of these topics interest you, and you are in London, we’re also running a workshop called “Production Ready Kubernetes”. Sign up here.
Application Checklist for Kubernetes
These are the areas that need attention before running your cluster in production.
|What is it||Why you need it||Options|
|Readiness Check||Endpoints for Kubernetes to monitor your application lifecycle.||Allows Kubernetes to restart or stop traffic to a pod. |
Readiness failure is transient and tells Kubernetes to route traffic elsewhere
Readiness failure is useful for startup and load management
|Read: Resilient apps with Liveness and Readiness probes.|
|Liveness check||Endpoints for Kubernetes to monitor your application lifecycle.||Liveness failure is for telling Kubernetes to restart the pod.|
|Metric instrumentation||Code and libraries used in your code to expose metrics.||Allows measuring operation of application and enables many more advanced use cases.||Prometheus, New Relic, Datadog and others.|
Read: Monitoring Kubernetes with Prometheus.
|Dashboards||View of metrics.||You need to make sense out of the data.||Grafana|
|Playbooks/Runbooks||Rich guides for your engineers on how-to operate the system and fault find when things go wrong.||Nobody is at their sharpest at 03:00 AM.|
Knowledge deteriorates over time
Weave Cloud Notebooks
|Limits and requests||Explicit resource allocation for pods.||Allows Kubernetes to make good scheduling decisions.||Read: Kubernetes Pod Resource Limitations and Quality of Service.|
|Labels and annotations||Metadata held by Kubernetes.||Makes workload management easier and allows other tools to work with standard Kubernetes definitions.||Read: Labels and Selectors in Kubernetes.|
|Alerts||Automated notifications on defined trigger.||You need to know when your service degrades.||Prometheus & Alertmanager.|
Read: Labels in Prometheus alerts: think twice before using them.
|Structured logging output||Output logs in a machine readable format to facilitate searching & indexing.||Trace what went wrong when something does.||ELK stack (Elasticsearch, Logstash and Kibana).|
Many commercial offerings.
|Tracing instrumentation||Instrumentation to send request processing details to a collection service.||Sometimes the only way of figuring out where latency is coming from||Zipkin, Lightstep, Appdash, Tracer, Jaeger|
|Graceful shutdowns||Applications respond to SIGTERM correctly.||This is how Kubernetes will tell your application to end.||Read: 10 tips for Building and Managing Containers|
|Graceful dependency (w. Readiness check)||Applications don’t assume dependencies are available. Wait for other services before reporting ready.||Avoid headaches that come with a service order requirement.||Read: 10 tips for Building and Managing Containers|
|Configmaps||Define a configuration file for your application in Kubernetes using configmaps.||Easy to reconfigure an app without rebuilding, allows config to be versioned.||Read: Best Practices for Designing and Building Containers for Kubernetes|
|Label the docker images with the code commit SHA.||Makes tracing image to code trivial.||Locked down runtime context||Read: Introduction to Kubernetes Security|
|Locked down runtime context||Use deliberately secure configuration for application runtime context.||Reduces attack surface, makes privileges explicit.||Read: Continuous Security for GitOps|