Labels in Prometheus alerts: think twice before using them

By Elena Morozova
August 28, 2018

As developers, we hear a lot about the importance of monitoring and alerts. But without proper notification, we might spend too much time trying to understand what really is going on. This blog post will give you an overview of common caveats of using labels in Prometheus alerts and demonstrate some technics how to get concise and easy to understand notifications.

Related posts

A Comprehensive Guide to Prometheus Monitoring

Living on the Edge - How Screenly Monitors Edge IoT Devices with Prometheus

How I Halved the Storage of Cortex

In this post we will look at how to write alerting rules and how to configure the Prometheus alertmanager to send concise and easy to understand notifications.

Some alerts may contain labels and others may not. For example, here is an Instance Down alert with the labels ({{ $labels.instance }} and {{ $labels.job }}) and another one without labels:

groups:
- name: example
  rules:
  - alert: InstanceDownLabels
    expr: up
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  - alert: InstanceDownNoLabels
    expr: up
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Instance down"
      description: "Something has been down for more than 5 minutes."

Let’s create a slack receiver. We can do this by using an example from Prometheus documentation:

- name: 'team-x'
  slack_configs:
  - channel: '#alerts'
    text: "<!channel> \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"

This receiver config says we want to get notification with common summary and common description.

But with these settings our Slack notifications look like this:

slack-instance down.png

Slack notifications with common annotations

The first alert, InstanceDownNoLabels, looks good. Buy why are the summary and description empty for InstanceDownLabels?

This happens because every time series is uniquely identified by its metric name and a set of labels. And every unique combination of key-value label pairs represents a new alert for this time series.

We used variables like {{ $labels.instance }} and {{ $labels.job }} in our description and summary, and as a result there is no common value for them.

We can try ranging over all received alerts (see example):

- name: 'default-receiver'
  slack_configs:
  - channel: '#alerts'
    title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
    text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

But with this config our Slack notifications look like this:

slack-all-alerts.png

Slack notifications with range over all alerts

Now alerts are not blank but the first one contains duplicates in Slack title and text.

One solution is checking both cases. For example, we can create our own template and save it in a file my.tmpl:

{{ define "slack.my.title" -}}
    {{- if .CommonAnnotations.summary -}}
        {{- .CommonAnnotations.summary -}}
    {{- else -}}
        {{- with index .Alerts 0 -}}
            {{- .Annotations.summary -}}
        {{- end -}}
    {{- end -}}
{{- end }}
{{ define "slack.my.text" -}}
    {{- if .CommonAnnotations.description -}}
        {{- .CommonAnnotations.description -}}
    {{- else -}}
        {{- range $i, $alert := .Alerts }}
            {{- "\n" -}} {{- .Annotations.description -}}
        {{- end -}}
    {{- end -}}
{{- end }}

And if we use this custom template for Slack notifications:

- name: 'default-receiver'
  slack_configs:
  - channel: '#alerts'
    title: '{{ template "slack.my.title" . }}'
    text: '{{ template "slack.my.text" . }}'
templates:
- 'my.tmpl'

Our Slack messages look like this now:

slack-template.png

Slack notifications with custom template

Now we have all needed information without duplicates. To make our template look nice we use minuses    before and after left and right delimiters {{ and }}. See go text/template documentation. For new lines we use {{"\n"}}.

For title we check if there is a common summary and use it, otherwise we use summary from the first alert to keep a summary short. 

For text we use the common description if one exists (not empty) or we range over all alerts and print the description for each of them. But there might be a lot of different values for labels and a lot of different descriptions. It is a good idea to add some limit for them, for example, only the first 10 descriptions:

{{- range $i, $alert := .Alerts -}}
    {{- if lt $i 10 -}}
        {{- "\n" -}} {{- index $alert.Annotations "description" -}}
    {{- end -}}
{{- end -}}

The same applies to alerting rules with {{$value}} inside an annotations.

Conclusion

To get proper notifications we need to make sure that our metrics, alerts and receiver match each other. In particular if we use labels or values in a field, we should expect to have different values of this field, and our templates need to deal with that. By contrast, if a field is static (doesn’t contain any labels, value) there is a common value across all alerts for this rule.

And of course we can use the same approach for other receivers like Email, PagerDuty, OpsGenie etc. For example Email message for the same alert as above with custom template looks like this:

email-custom-template.png

Email message with custom template

See configs for receivers with <tmpl_string> format.

Happy Monitoring!


Related posts

A Comprehensive Guide to Prometheus Monitoring

Living on the Edge - How Screenly Monitors Edge IoT Devices with Prometheus

How I Halved the Storage of Cortex