Prometheus, ConfigMaps and Continuous Deployment
This is the story of how we manage our Prometheus config to avoid restarting Prometheus too often, losing all our history. It is a short, up to date write up of a talk I gave at the first London Prometheus meetup. In the beginning there...
This is the story of how we manage our Prometheus config to avoid restarting Prometheus too often, losing all our history. It is a short, up to date write up of a talk I gave at the first London Prometheus meetup.
In the beginning there was the monitoring über Pod. This Pod contained a series of monitoring-related containers: Prometheus, Alert Manager, Grafana etc. The container images for this Pod embedded configuration files, and were build by our CI system, CircleCI. Prometheus storage was also in the Pod, and as such was ephemeral: every time you restarted the Pod, you lost your Prometheus history:
As it turns out, Prometheus history is kinda useful – it is hard to know whether XXXms is good or bad latency, unless you can compare it to historic latency. Around this time we also introduced our first Continuous Deployment system, and Pods in the dev environment started getting restarted more and more frequently.
So we set about disaggregating this über Pod into a series of single-container Pods, such that whenever a single container image was updated, only a single Pod was restarted. As the Prometheus scraping rules didn’t get changed very often, this solution minimised the number of times we lost history:
With this change, all was well for many months. I even wrote this up as a “best practice” blog post for the Weaveworks blog.
Then we decided to get more serious about dogfooding Cortex, our Prometheus as a Service. Cortex requires us to run a local Prometheus instance to scrape targets and send samples to the cloud – and this local Prometheus needs to have almost exactly the same configuration as our “vanilla” Prometheus, plus or minus some credentials. Building a separate container image embedding these credentials seemed distasteful, as did some custom hacky credential injection scripts. Plus, we had long desired to use the upstream, official images – so it would be easier to stay up to date.
We eventually ended up with a solution where we use Kubernetes ConfigMaps to store the Prometheus configuration and a custom Python EDSL to generate the (slightly different per environment) ConfigMaps, the details of which are for another post…
One of the challenges of using ConfigMaps in Kubernetes is that whilst the files they represents are updated automatically in any running containers (good), there is no hook to tell the processes running in the container that this has happened (bad). Prometheus supports runtime reloading for configuration, so we don’t need to restart the container. All we needed was a small container to watch the file and prod the Prometheus config reload endpoint. After trying a few small programs that promised to do this, we settled on eaburns/Watch, and forked it to build a container image. This is run as a sidecar to the Prometheus container as such:
spec: containers: - name: prometheus ... volumeMounts: - name: config-volume mountPath: /etc/prometheus - name: watch image: weaveworks/watch:master-5b2a6e5 imagePullPolicy: IfNotPresent args: ["-v", "-t", "-p=/etc/prometheus", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:80/-/reload"] volumeMounts: - name: config-volume mountPath: /etc/prometheus volumes: - name: config-volume configMap: name: prometheus-config
And with this, any changes to the Prometheus ConfigMap will (eventually) automatically be updated in the Prometheus Pod and Prometheus will be automatically told to reload the config, without losing history. In practice the latency from running kubectl apply to this happening can be anywhere up to 5 minutes.
With the old Continuous Delivery system the kubectl apply step could even be automated such that as soon your change hit the master branch of service-conf.git this process was kicked off and Prometheus would eventually be updated. A similar system existed for Flux (called kubefix), but unfortunately this has been disabled as it was racey, so you’ll need to manually run kubectl apply for now.
The above are just some of the hoops you have to jump through to make persistence work when running Prometheus in your own cluster. With Weave Cloud’s built in hosted, scalable Prometheus we handle persistence for you, so you don’t have to jump through any such hoops.
Thank you for reading our blog. We build Weave Cloud, which is a hosted add-on to your clusters. It helps you iterate faster on microservices with continuous delivery, visualization & debugging, and Prometheus monitoring to improve observability.