What is Prometheus?

Cloud native environments are highly automated and dynamic, and when it comes to monitoring applications, traditional server-based monitoring tools designed for relatively static systems are not adequate.

Prometheus is native to a containerized environment and it is built to monitor applications and microservices running in containers at scale. Data scraped by Prometheus from running services is time-based data that is queried via the PromQL language. One of the advantages of querying time-based data with PromQL is the ability to step back through time and diagnose a problem in situ without having to independently recreate the issue.

Weave Cloud Monitor

Pulling and Discovery

Prometheus is a pull-based monitoring system, which means that central Prometheus servers discover your services and then pull metrics from them. The discovery and pull system fits well with dynamic, cloud native environments such as Kubernetes, where Prometheus integrates with Kubernetes’ built-in service discovery, but it also works with other Service Discovery mechanisms such as DNS, Consul, ECS, and Marathon to automatically find your services.

Metrics are pulled by Prometheus based on what it finds in Service Discovery; it scrapes the metrics endpoints from those jobs (a collection of instances) and also through any exporters you may have configured.

If you installed the Weave Cloud Agents using our Kubernetes installation you will automatically have Prometheus and several exporters (e.g. node exporters, for per-host CPU/memory/disk usage) installed for you and configured to send metrics to Weave Cloud.

If your app is running in Kubernetes, Prometheus automatically starts pulling the metrics from the extra replicas. Similarly as nodes fail and Kubernetes pods are restarted on different ones, it automatically notices and scrapes the metrics from those as well.

What Weave Cloud Brings to Prometheus

Weave Cloud extends Prometheus by providing a distributed, multi-tenant, horizontally scalable version of Prometheus. We host the scraped Prometheus metrics for you, so that you don’t have to worry about storage or backups. A GUI for running PromQL queries and for configuring alerts against when things go wrong is also provided.

If you need more than what the Weave Cloud GUI offers, a Grafana Dashboard browser plugin is available that can be downloaded from the Google Chrome store. The plugin is fully integrated with Weave Cloud so that you can query and graph the metrics collected by Cortex from within the Grafana dashboard.

Other 3rd party Prometheus dashboard options can also be used with Weave Cloud and console templates may also be configured.

See, Console Templates for more information.

See Integrating Grafana and Running PromQL Queries.

Prometheus Architecture in Weave Cloud

At Weaveworks, we use Prometheus to monitor all of our Weave Cloud production servers that are running in Kubernetes. But we didn’t like the idea of a single server architecture that Prometheus provided.

We needed a version of Prometheus that could scale and also one that supported the storage of more metrics than could fit onto a single hard disk. It was with these requirements in mind that led us to build Weave Cortex: a distributed multi-tenant version of Prometheus.

Weave Cloud and Prometheus

With Weave Cortex a Prometheus retriever is deployed to your cluster, where it scrapes the Prometheus metrics from your jobs (your application and its microservices, as well as metrics from your cluster and hosts). The data it collects from the Prometheus retriever is then pushed by Cortex to Weave Cloud. Once the metrics start coming in from your application they can be accessed from your browser in Weave Cloud using the built-in GUI or through other third party Prometheus dashboards such as Grafana.

Historical metrics collected by Prometheus are processed through a series of distributors, rings and ingesters and then flushed to a set of storage services provided by AWS. All Prometheus metrics collected from your data centre are backed up to DynamoDB and S3 for long time storage, and are quickly retrieved using a combination of Memcache and Consul for fast past historical incident analysis.

Cortex is also open source, so that you can run it for yourself if you don’t want us to do that for you.

What is the Scale of the System?

A large Prometheus server can processes millions of time-series with weeks and months of data retention, depending on your scrape interval.

As an example, let’s say you want to monitor about 100 containers or processes per host. Assume each container or process exports 1000 time series at a scrape interval of 5 seconds. This provides 20k data points per second per host. This means that for 100k hosts, there are 2 billion data points per second, or 40 GB per second at 20 bytes per datapoint.

Long Term Storage of Metrics

One of the benefits of Weave Cloud and Cortex is that it provides long term storage for your Prometheus metrics. Stored metrics are invaluable for stepping back in time to pinpoint and then correct problems with your application or its infrastructure when and where they occurred.

Prometheus Data Model

Prometheus creates a time-series database that stores data identified by its metric name and by a set of key-value pairs that are known as labels.

A time-series is simply a series of values with time stamps associated with them. For example, imagine recording the temperature in a room. And for every minute that your thermometer is plugged into a Raspberry Pi, the temperature for that room is recorded at that point in time. After you’ve recorded the data for some time, you will get a list of timestamps as well as the temperature of the room in celsius for that point in time.

The values are stored internally by Prometheus as floats or floating point numbers. By storing this data as floats, a number of different metric types can be represented, such as counters, gauges, histograms and summaries, all of which can be later queried to make sense out of.

See the Prometheus Documentation for a more in-depth discussion of the data model it uses.

Prometheus Metric Types

Gauges

A gauge increases or decreases over time such as a thermometer. But if you were to sample the temperatures in a room at: 10, 20, 10. You would miss out on some values at certain times and lose some resolution. For example you wouldn’t know that the temperature of the room ever went up to 20 degrees and then back down to 10 again.

Gauges can be somewhat lossy like that.

Counters

Counters on the other hand are monotonically increasing. And so apart from when they get reset back to 0, counters are always increasing. This is useful if you want to measure two points or samples. You may lose some detail on what happened in between, but you’ll still be able to get a sense of the overall rate of change.

Histograms

Histograms are a type of metric that all of the Prometheus client libraries also have support for. They allow you to specify a set of buckets. Histograms expect values to be within a certain range and so are very efficient for looking at how many samples of things fell into a particular bucket. Data collected to histograms are also cumulative.

Summaries

Summaries are similar to histograms in that they count requests based on an interval, but the difference is that data in summaries is aggregated over a sliding time scale and they also provide a number of configurable options that can calculate data by streaming quantiles on the client side.

The main difference is that summaries (and histograms for that matter) are able to calculate negative values. Unlike a counter that only goes up, summaries and histograms can also go down.

For example usage, see Histograms and Summaries in the Prometheus docs.

Metric Names and Labels in Prometheus

Prometheus uses ‘labels’ (key-value pairs) to select objects in the system. Labels help to identify segments of time series data and are used by the PromQL to select the time series in order to aggregate over.

Note: When an instance or job is scraped, Prometheus by default adds the instance’s address and job name as a label to a time series.

See Instrumentation Best Practices for more information.

For example, you may have a time-series represented as follows:

{key 1=A, key2=B} -> [(t0, v0), (t1, v1), ...]

These might for example represent the data from multiple thermometers measuring the temperature in a number of different rooms.

They may be labeled as follows:

__name__=thermometer and room=livingroom, kitchen, diningroom, bathroom and so on.

Each of those thermometers maps back to a separate time-series which stores the temperatures about each sampled room, for example:

{__name__=thermometer, room=dining room} -> [(timestamp, value), ...]

Special Label Types

With labels in mind, Prometheus also makes use of a special internal label type that is identified by double underscores: ` name`

This label key shortens from:

{__name__="requests"}

to just requests. So when you see a PromQL expression like requests, what’s going on is that PromQL is translating that into a search for the key {__name__="requests"}.

See Running PromQL Queries for more information.

Monitoring a Web Server: An Example

When monitoring a web server, the most common metric you might see is an increasing counter, i.e. one that is always going up and to the right.

Assume that samples from the web server are being taken once per second and for the first three seconds, Prometheus recorded one web request per second over that interval. But then suddenly the website starts getting more popular and now you’re getting 10 requests per second over the next 3 seconds before the traffic tails off, and then it returns to only having one request per second for the last 3 seconds.

This is a time-series of events.

What Prometheus is storing internally is a mapping between the request and the time that it was made. The name or a time-series is labeled, requests, and they are interpreted as: at time 1 second there was 1 request, at time 2 there were 2 requests, at time 3 the value was 3 and so on.

{__name__= "requests"}

Stores the data in a time-series format as follows:

[(t1, 1),  (t2, 2),  (t3, 3),  (t4, 13),
 (t5, 23), (t6, 33), (t7, 34), (t8, 35),
 (t9, 36), (t10, 37)]

These labeled requests can then be easily retrieved with the Prometheus Query Language (PromQL). A straight retrieval of the metrics would simply display a graph that shows all of the data as a counter increasing over time.

But a counter alone is not particularly useful and you’d have to look carefully to determine any gradient in that data. What you really want to know is when you received that spike in traffic. In order to do this you need to query the data and turn it into a ‘rate of change’. But before you see a rate of change, there are few concepts to understand first.

Before looking at the rate of change, you will need to add a time interval to the data. By adding an interval the instance type and its data is transformed into a vector:

requests[P]

Basically what the period syntax does [P] is that it says for each value, instead of a single value, turn it into a vector or a list of values for the entire time period up to the last value of P. The period, [P] could refer to 3 seconds, 1 minute or 2 hours.

requests[3s]

[p] takes the requests which are currently stored in a scalar list format:

t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36

And turns them into an array of vectors ordered by interval:

t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1 2 3 13 23 33 34
2 3 13 23 33 34 35
3 13 23 33 34 35 36

Essentially you are chunking the time periods and their requests and writing the data as a vector list. This is how Prometheus stores its values internally. The reason that Prometheus needs that data as a vector is so that it can calculate important things like the rate of change.

Finding the Rate of Change rate()

There’s a function built into PromQL that is called rate() that allows you to find the per second rate of change. Instead of incrementing the counter every time a request is recorded, it shows the number of requests that have recorded per time interval and in this case per 3 second intervals:

rate(requests[3s])

rate() is calculated by subtracting the first value from the last value and then dividing by the last_time - first_time in each interval, which in this case is 3s. The result is the calculated rate of change:

[1, 5, 10, 10, 5.5, 1, 1]

Now a more useful set of graphs can be made, one where you’ve gone from a scalar counter to a more meaningful rate of change diagram:

Prometheus Counter

And as you would expect, you can see that the rate of change goes from 1 per second up to 10 per second and then back down to one per second. You can see by looking at this data that the number of requests were low, then they increased before returning to a low point again. By applying a 3 second interval the data is ‘smoothed out’, which makes it easier to understand what it is that you’re looking at.

Rate of Change

Labels and Queries

In the example used here, requests is a shorthand for the key-value that has __name_= requests.

You can easily add more labels to that metric such as:

{__name__="requests", job="frontend"}

Which is used in PromQL as follows:

requests {job="frontend"}

And so we could query:

rate(requests{job="frontend"}[1m])

This allows you to return the rate of requests over a 1 minute interval. PromQL is a powerful language for expressing aggregation and any other calculations that you need to see on the stored time-series data. In addition to this, you can be instantly alerted to those changes when they occur.

For more information see Instrumenting and Running Queries.

Further Reading