Announcing Kured, a Kubernetes Reboot Daemon

By Adam Harrison
November 03, 2017

Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.

Related posts

Cloud-Native Now Webinar: Containers Kubernetes Management

Kubernetes Security - A Complete Guide to Securing Your Containers

KubeCon EU 2023 Recap – GitOps Sessions on Flux with OCI, Liquid Metal CI/CD Platforms & Telco Cloud Platforms

The facts  

Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots. It gets triggered by the package management system of the underlying OS. 

2017-10-24-kured-logo.png

In essence Kured:

  • Watches for the presence of a reboot sentinel e.g. /var/run/reboot-required
  • Utilises a lock in the API server to ensure only one node reboots at a time
  • Optionally defers reboots in the presence of active Prometheus alerts
  • Cordons & drains worker nodes before reboot, uncordoning them after

The Reboot Problem 

At Weaveworks the development and production clusters underpinning Weave Cloud are orchestrated with Kubernetes running on EC2, maintained with Terraform and Ansible

The EC2 instances run Ubuntu 16.04 withunattended-upgradesenabled, so the machines need to be rebooted periodically (mainly in response to kernel upgrades). If they aren’t, the clusters are at risk from security vulnerabilities, and eventually run out of disk space as the OS is unable to remove older kernels and modules.

The first attempt  

Our initial approach to this problem was to trigger a Prometheus alert whenever the /var/run/reboot-required file appeared on any of the nodes.  We tried coupling it with a manual process that entailed waiting for a safe moment - defined as no active alerts - before draining the application pods and then rebooting each node in turn.

Automation makes everything better

Whilst this worked in practice, the frequency of OS updates coupled with the quantity of nodes drove us eventually to an automated solution. And so for the past six months all reboots have been conducted safely and automatically by kured, our Kubernetes reboot daemon. 

During this time kured has effected hundreds of node reboots in our dev and prod clusters without human intervention - in fact, until the relatively recent addition of Slack notifications, we were mostly unaware that it was happening at all.

Kured is here!

Now that we have gained confidence in the implementation through an extended period of operational use, we are pleased to share the 1.0.0 release of kured with the community under an Apache 2 license. Kured should work with most Kubernetes installations and distros - read more about how it works on Github:

Join our community slack if you have any questions or suggestions! 


Related posts

Cloud-Native Now Webinar: Containers Kubernetes Management

Kubernetes Security - A Complete Guide to Securing Your Containers

KubeCon EU 2023 Recap – GitOps Sessions on Flux with OCI, Liquid Metal CI/CD Platforms & Telco Cloud Platforms