The alerting feature in Weave Cloud implements the Prometheus Alertmanager and enhances it by providing a convenient in-context editor for you to specify rules for alerts and recording rules. An additional screen is also available for setting up routing for those rules which can be an alert to an external service like Slack or if it’s a recording rule, then saved to its own time series.

Prometheus supports two types of rules:

  • Recording rules are for pre-calculating frequently used or computationally expensive queries. The results of those rules are saved into their own time series.

  • Alerting rules on the other hand enable you to specify the conditions that an alert should be fired to an external service like Slack. These are based on PromQL queries.

For information on alerts in Prometheus see the Alertmanager in the Prometheus documentation.

Setting up Alerting & Recording Rules

To specify alerts and recording rules:

  1. Set up the recording rules for your queries in Weave Cloud. Select Monitor and click Define Alerting and Recording Rules:

Monitoring: Recording Rules

Example alert rule:

ALERT ApiServerDown
  IF          absent(up{job="default/kubernetes"}) or sum(up{job="default/kubernetes"}) < 1
  FOR         5m
  LABELS      { severity="warning" }
    summary = "Kubernetes API server down for 5m",
    impact = "Our Kubernetes cluster is inoperable. User impact uncertain.",

See Alerting Rules for more details.

Example recording rule:

# Error rate
job:cortex_request_errors:rate1m =
   (sum(rate(cortex_request_duration_seconds_count{status_code=~"5..|error"}[1m])) by (job)) /
   (sum(rate(cortex_request_duration_seconds_count[1m])) by (job))

job_method:cortex_request_duration_seconds:99quantile =
   histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket{ws="false"}[5m])) by (job,method,le))
job_method:cortex_request_duration_seconds:50quantile =
   histogram_quantile(0.5, sum(rate(cortex_request_duration_seconds_bucket{ws="false"}[5m])) by (job,method,le))

job_route:cortex_request_duration_seconds:99quantile =
   histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket{ws="false"}[5m])) by (job,route,le))
job_route:cortex_request_duration_seconds:50quantile =
   histogram_quantile(0.5, sum(rate(cortex_request_duration_seconds_bucket{ws="false"}[5m])) by (job,route,le))

See Recording Rules for more information.

Setting up Routing for Your Alert Rules

Specify the routing for any of the Alerting rules that you defined above. The dialog saves a standard YAML file for you.

Alertmanager Configuration

See Sending Alerts for more information.

Further Reading