A Practical Guide: From Instrumenting Code to Specifying Alerts with the RED Method

October 30, 2017

This practical guide will help you getting started with monitoring your microservices with Prometheus. We walk through selecting key metrics, instrumentation, setting up alerts and Grafana dashboards.

Related posts

A Comprehensive Guide to Prometheus Monitoring

Multi-cluster Application Deployment Made Easy with GitOpsSets

Close the Cloud-Native Skills Gap and Use Automation to Boost Developer Productivity with Weave GitOps Release 2023.04

If you’ve been following our blog and perhaps others as well, you should be familiar with the RED methodology that was developed here at Weave.  As a follow on to that methodology, Carlos Leon (Github: @mongrelion) and Jason Smith (Github: @jasonrichardsmith)  of Container Solutions have been busy and created a series of step by step tutorials that puts the RED methodology into practise: 

If you’re not familiar with the RED method, here’s a short recap. 

What to instrument?

When setting up Prometheus Monitoring an important decision to make is the type of metrics you need to collect about your app. When a problem occurs, the metrics you’ve instrumented will simplify troubleshooting problems. 

To help us think about what’s important to instrument, we defined a system that we call the RED method. The main idea behind this method is that it focuses on measuring the things that end-users care about while using your web services. 


In summary three key metrics are instrumented that monitor every microservice in your architecture:

  • (Request) Rate - the number of requests, per second, your services are serving.
  • (Request) Errors - the number of failed requests per second.
  • (Request) Duration - The amount of time each request takes expressed as a time interval.



Rate, Errors and Duration attempt to cover the most obvious web service issues. They also capture an error rate expressed as a proportion of request rate. 

With these three basic metrics, we believe the most common problems that can result in poor customer satisfaction are covered. 

See the post The Red Method: Key Metrics For Microservices Architecture for a full explanation. 

Monitoring Microservices

In the first tutorial we show you how to stand up a GKE cluster, deploy the Sock Shop onto it and then connect Weave Cloud to it. The Sock Shop is constructed of microservices and in this example it illustrates the different kinds of metrics that you can track. 

The RED Method is a starting point

An important point to keep in mind is that while the RED method encourages you to have a standard overview of the metrics your user’s care about, the method should be considered a good starting point -- it doesn’t tell you much about the infrastructure on which your app is running.  To keep track of your infrastructure you should also include these types of metrics (at a minimum). 

Cluster Metrics

As a complement to the RED method collect:

  • Total amount of CPU available & being used
  • Total amount of RAM available & being used
  • Network traffic/throughput

Storage (disk I/O, disk usage, etc.)

Tracking storage metrics allows you to correlate, the amount of RAM that your application is consuming against the amount of website traffic that you have. This can help you detect memory leaks, determine if your disks are filling up and also see if your infrastructure is being underutilized.

Database Metrics

For SQL and NoSQL databases, you can measure:

  • Current number of connections
  • Query duration
  • Number of queries per second

These metrics help you to detect slow running queries that can be correlated with any slow rendering pages on your site.

Other Metrics to Consider

  • 3rd party API’s rate, errors and duration. These metrics help you detect if the bottleneck in an endpoint found at a certain point in time is either on your side or on the external API’s side of things
  • Job Queues. If you have a pool of background workers (for processing video, audio, images, bulk sending emails and tasks of that sort), measure the amount of jobs that you currently have in the queue. Bonus points if you can measure the amount of jobs processed since the last reading

monitor1.png

Sock Shop monitoring query shows the request rate for each service both for HTTP 200 status codes as well as for HTTP 500 errors


For instructions on using Weave Cloud to Monitor these types of metrics, see ‘Monitoring Microservices in Weave Cloud’. 

Configuring Alerts 

For your convenience, the Sock Shop has already been instrumented with the RED metrics plus others. You can clone this app and use it as a reference when you instrument your own app.  Once the Sock Shop is deployed to a cluster and connected to Weave Cloud, our hosted prometheus monitoring solution takes over and begins scraping the metrics and then saves them as key-pairs to a time-series database where they can be queried with PromQL. 

Routes, Receivers and Alerting Rules

Some terminology used in setting up alerts include: 

  • Alerts -- fired when a threshold is met within a PromQL query 
  • Routes --- rules that define how to triage the alerts as they are fired
  • Receivers -- tool configurations for where to send the alert notifications

Before you can specify any alerts, you must first setup Routes and their Receivers. 

Supported Receivers 

The list of supported receivers include: 

  • Email
  • Hipchat
  • Pagerduty
  • Pushover
  • Slack
  • Opsgenie
  • Victorops
  • Webhook

Routes and Receivers are in standard YAML format and can be input right from the Weave Cloud GUI: 

alert1.png

After your Routes and Receivers are defined, you are ready to input Alerting Rules. This can also be done from within Weave Cloud.  

Rules should look like as follows:

    ```
    # Request alert
    ALERT HighRequestRate
    IF (sum(rate(request_duration_seconds_count{job=~"^sock-shop.*",status_code=~"2..",route!="metrics"}[1m])) by (name,instance,job)) > 10
    FOR 10s
    LABELS { severity = "warning" }
    ANNOTATIONS {
    <span class="Apple-converted-space"> </span> summary = "Job {{ $labels.job }} has high requests",
    <span class="Apple-converted-space"> </span> description = "{{ $labels.instance }} of job {{ $labels.job }} has a high rate of requests.",
    }
    ```

The tutorial then tests those queries using two very useful testing tools: Siege, a multithreaded benchtesting tool,  and the Scope Traffic Control Plugin, which simulates latency so that you can test that your alerts are working as expected. 

Integrating Grafana with Weave Cloud

The final tutorial in the series describes how to integrate Grafana dashboards with our hosted Prometheus solution in Weave Cloud where you will end up with several dashboards like: 

grafana1.png

Final Thoughts

This blog highlighted several new tutorials on monitoring, metrics and configuring alerts with Weave Cloud. 

Try out the new tutorials for yourselves: 

  1. Monitoring Microservices with Weave Cloud
  2. Configuring Alerts, Routes and Receivers with the RED Method
  3. Integrating Grafana with Weave Cloud

As always, send us your thoughts, suggestions and criticisms -- we love to hear from our users. Ask us anything on Slack or join the Weave Online User Group for talks on topics like these. 



Related posts

A Comprehensive Guide to Prometheus Monitoring

Multi-cluster Application Deployment Made Easy with GitOpsSets

Close the Cloud-Native Skills Gap and Use Automation to Boost Developer Productivity with Weave GitOps Release 2023.04