In this post I discuss the various options and tradeoffs we encountered at Weaveworks when deploying Prometheus with Kubernetes to monitor Weave Cloud. This is the second post in our series on Prometheus and Kubernetes, the first being “Prometheus and Kubernetes: A Perfect Match”.

TLDR; At Weaveworks:

  • Our CI pipeline builds custom Prometheus, Grafana and Alertmanager images, with our config embedded inside, and pushes these images to a private repo.
  • We run these images as pods on our Kubernetes clusters, with a separate pod for each of Prometheus, Alertmanager and Grafana
  • We use our Continuous Delivery system to ensure we always running the latest images.
  • Use a custom authenticating reverse proxy that is integrated with our user management service, to allow easy access to Prometheus and Grafana dashboards from anywhere on the internet.

Where to run Prometheus?

In summary: Do you run this as a Pod/Service on Kubernetes, or alongside your Kubernetes cluster on a dedicated machine/VM?

Dedicated machine vs Pod?

If you choose to run Prometheus on your Kubernetes cluster, you need to be aware that Kubernetes prefers to treat services as if they are stateless. If you want to be able to do rolling upgrades for Prometheus, then in a naive set up each upgrade loses your stored Prometheus history. Some progress has been made in Kubernetes 1.3 with PetSets, which allow you to associate storage with Pods in a more coherent way, but PetSets currently do not support rolling upgrades.

If you choose to run Prometheus on a dedicated VM next to your Kubernetes cluster, you need to ensure that Prometheus can connect to each of your Pods. This is further complicated due to the Kubernetes networking model. If you are using an SDN like Weave Net or Flannel, this can be as simple as extending the SDN to include the dedicated Prometheus machine. But if you are running on AWS and using the programmable routing tables (the default setup for Kubernetes on AWS, and something Kubernetes manages for you) this becomes slightly harder, and is beyond the scope of this blog post.

It is also worth considering resource usage – Prometheus is a monolithic process, albeit a very scalable one. As such you need a very large machine to run Prometheus to monitor thousands of servers. Kubernetes prefers many smaller processes that it can spread over different machines, as this helps with its scheduling and bin packing decisions. If you have a small cluster (tens to hundreds of machines) running Prometheus in a Pod is sufficient.

For these reasons, Weaveworks chose to deploy Prometheus as a Pod/Service in our Kubernetes cluster – we treat the monitoring data in Prometheus as ephemeral, and we’re optimizing for MTTR anyway. The ability to use our continuous deployment system to manage Prometheus upgrades was a big win for us.

How to Manage Prometheus Config?

The next choice to make is how to manage the Prometheus config. We’ve been running Prometheus on Kubernetes since before Kubernetes concepts like ConfigMaps existed, and as such we chose to compile our config for Prometheus into a custom Prometheus container image, hosted on a private image hub (Quay.io from CoreOS – it’s excellent!). This was a straightforward process for us since CI pipeline (running on CircleCI) was already set up to build images, upload them to our hub, and then automatically deploy them to our cluster.

Deployment Pipeline for Weave Cloud

This approach allows us to quickly revert the version of the image we use when we make a mistake in the config, reducing cognitive load as the operators of the cluster don’t need to remember the config is managed elsewhere. The downside is that any config changes require us to deploy a new version of the image, which results in us losing our Prometheus history.

If we were approaching this problem now, I would use ConfigMaps in Kubernetes to manage the Prometheus configuration. These can be mounted into the Pod and allow you to modify the Prometheus config without needing to rebuild and redeploy a custom image – you can even use the official Prometheus image built by the Prometheus project! When the config is updated you only need to send a SIGHUP to Prometheus to reload it, something that can’t be automated with the current Kubernetes version, but there is ongoing discussion around making this work. When it does, this is the technique we will migrate to.

How to Deploy Prometheus?

We initially started by deploying all three components of a modern Prometheus monitoring system (Prometheus, AlertManager and Grafana) into a single uber-monitoring Pod. This simplified the process since each component addressed each other using well known ports on localhost, providing a single ‘thing’ to manage.

We quickly discovered that this was a bad idea – whenever we changed anything in our config, we had to redeploy the entire pod and lose all the history (as previously mentioned). We use Continuous Delivery triggered by our Continuous Integration pipeline, and as such some days we redeployed the monitoring pod tens of times, like when we were iterating on the dashboards. Whatsmore, we had to teach each of the engineers the port numbers of each of the services, and they had to remember than Grafana was on port 5000, Prometheus on 8000 and AlertManager on… I forget.

Eventually we disaggregated this uber-monitoring pod into a set of individual pods for each of Prometheus, AlertManager and Grafana. Not only did this allow us to iterate on each component independently without losing all of our Prometheus history, but we could also take advantage of well known ports for their web interfaces – we put each service on port 80, so developers didn’t need to remember any ports any more.

Accessing Prometheus/AlertManager/Grafana

The Kubernetes network model assumes a flat, private network between all of your pods. This means that building your distributed application is easier, since you don’t need to manage a big list of port number for each service – or better yet, you don’t need a dynamic port allocation scheme coupled to a service discovery system. But this simplicity can present some challenges for getting access to internal services on the Kubernetes network.

Since we started using Prometheus and Kubernetes before the introduction of things like Ingress controllers, we had to roll our own way of getting authenticated access to internal services. The initial ‘hack’ we constructed was to kubectl port forward a local port to a SOCKS proxy running in a pod on the Kubernetes network.

An opensource Go implementation of the SOCKS protocol (written by Armon from Hashicorp) allowed us to to easily get up and running quickly. We packaged the SOCKS proxy into a container with a few tweaks, provided an easy to use script for our developers and coupled it with a nice chrome extension. This system allows easy access to the internal interfaces offered by our services in an authenticated fashion.

The SOCKS proxy is still in use today as a backup for when things go wrong, but for day-to-day access to Prometheus and Grafana we built a system of ‘admin’ routes into our frontend load balancer. These routes rely on the same authentication mechanism we use for our normal user access, however users must be members of a special group to access them. The advantage this system has over the SOCKS proxy is that we can now access the Prometheus and Grafana from our iPads, while we’re in meetings or in bed!

As of version 1.2, Kubernetes implemented Ingress controllers to satisfy this use case, and if we had to do it again we would probably use them.

Deploying Prometheus and Kubernetes

In summary, at Weaveworks:

  • Our CI pipeline builds custom Prometheus, Grafana and Alertmanager images, with our config embedded inside, and pushes these images to a private repo.
  • We run these images as pods on our Kubernetes clusters, with a separate pod for each of Prometheus, Alertmanager and Grafana
  • We use our Continuous Delivery system to ensure we always running the latest images.
  • Use a custom authenticating reverse proxy that is integrated with our user management service, to allow easy access to Prometheus and Grafana dashboards from anywhere on the internet.

Having said all of that, we are not finished! We’re migrating to using ConfigMaps for managing configuration and are investigating PetSets for persisting Prometheus history across upgrades.

Please get in touch, we’d be happy to hear your thoughts or criticisms.

For additional reading on Prometheus, check out my other blog posts in this series:

Download our white paper “Application Monitoring with Weave Cortex: Getting the Most out of Prometheus as a Service” to learn more about Prometheus.