Throwback Thursday: Configure notifications in Prometheus’ Alertmanager
Monitoring is crucial for any developer and on call engineer. But sorting through multiple notifications or missing critical information for a quick resolution is no fun. This blog post recaps a KubeCon lightning talk that demonstrates using labels in Prometheus alertmanager to only receive concise and easy to understand notifications.
Labels in Prometheus alerts: think twice before using them
Aggregating Pod resource (CPU, memory) usage by arbitrary labels with Prometheus
Monitoring Your Kubernetes Infrastructure with Prometheus
It almost feels like decades have past but KubeCon and CloudNativeCon North America have taken place just a short 4 weeks ago. So it’s time for a proper Throwback Thursday and look at one of the talk highlights from the show.
Elena’s lightning talk on Monday evening drew a huge crowd and a lot of social media love. In only 4 short minutes, Elena demonstrated how to get concise and easy to understand alerts and notifications from Prometheus’ Alertmanager.
That was a super cool lightning talk about @PrometheusIO alerting rules, labels, and associated gotchas. I can't summary it in any way BUT there is a detailed blog post associated to it, so: https://t.co/cagt0JjXj9 https://t.co/DpucaNjVTf— Jérôme Petazzoni (@jpetazzo) December 11, 2018
Let’s have a look behind the scenes and why we started spending time on the Prometheus’ Alertmanager.
In our SaaS product Weave Cloud we utilize a horizontally scalable, hosted monitoring service, based on Prometheus, to monitor Kubernetes clusters and applications. Weave Cloud aggregates metrics across a cluster and from all layers of the stack in a dynamic environment and allows the user to query it through an enhanced and easy to use interface. It allows developers and operators to understand the health and behavior of their applications at all times but especially before and after deployments.
Within Weave Cloud’s interface you can also set up and configures rules against metric thresholds that when met, route alerts to your preferred notification systems such as Pagerduty, OpsGenie, StackDriver or Slack.
Since we here at Weaveworks need to monitor our own service, we spend time on setting up and using alerting rules to define alert conditions. For our on call support engineers it is greatly important to easily understand and be able to act immediately on the received notification. An alert that says, “Instance down” is not aiding in finding quick resolutions to the problem...
That is why we are suggesting using labels in alerting rules. The additional information can be attached to each alert and help pinpoint and identify a problem.
Watch Elena’s talk (slides are here) to see how she created and implemented an improved notification template. If you would like to follow her hands on tutorial, have a look at the blog post “Labels in Prometheus alerts: think twice before using them.