tl;dr: Weave Mesh, an open-source gossip and CRDT communications library, gave the Prometheus Alertmanager exactly what it needed to be reliable and highly-available.
As a former SoundCloud engineer, I’ve had the privilege of using Prometheus—an “open source systems monitoring and alerting toolkit”—from the early days. Since 2012, Prometheus has had a multidimensional data model and a powerful query language as its core features, but the alerting component has been a relatively recent addition.
In this post we’ll take a closer look at the Alertmanager, the Prometheus component responsible for converting alert conditions to emails and pages, and how Weave Mesh is instrumental in turning it from a single point of failure into a highly-available and reliable distributed system.
Prometheus Architecture Overview
First, a quick architecture overview. To use Prometheus, your services are instrumented with a Prometheus client library, which exposes white box application metrics on a well-known HTTP endpoint. Prometheus servers typically subscribe to your service discovery system to learn about those targets, and then scrapes them on a regular interval. That time-series data is stored and made queryable using PromQL, and graphs and dashboards can be built on top of it.
But more importantly, each Prometheus server is completely independent of the others. High-availability is achieved by running multiple servers scraping the same targets, which is also common in large organizations for teams that own and operate their own Prometheus infrastructure.
Alerts are defined in the server config as PromQL expressions. Once they evaluate to true for a given period of time, the Prometheus server continuously emits alert events to a separate Alertmanager process, which takes care of aggregating, de-duplicating, and forwarding the alerts to developers via email, PagerDuty, Slack, and so on. In the beginning, Alertmanager was a simple process that could live as a singleton alongside your Prometheus servers. But, obviously, being a singleton makes it a single point of failure.
When Prometheus core developer Fabian Reinartz took up the task of making Alertmanager highly-available, he knew the challenge was unique.
“For notifications, we care about at-least-once delivery above all else. When an Alertmanager cluster is in an unhealthy state, we’re OK with receiving the same notification twice, or having a temporarily inconsistent view of alerts in a notification. In return, we want stronger availability guarantees that ensure we always receive notifications as long as there’s at least one Alertmanager still running.”
“Raft is a so-called CP system, and it works based on consensus. That means you can only do something — accept commands and make progress — if you have consensus in the cluster.”
Network partitions are real, and the network is unreliable; perhaps especially when you’re in a situation that calls for alerts!
What you want instead, Fabian observes, is a so-called AP system. You want “several Alertmanagers to exist and work independently. If they can see each other, good; but if they can’t, then they should still be able to perform useful work. This style of eventual consistency is exactly what gossip protocols provide, and, handily, is precisely what Weave Mesh implements!
“Weave Mesh makes it extremely easy to set up communication channels for a set of Alertmanagers, and handle their cluster membership. It abstracts away the difficulty of discovery and transparently handles cases where two participating Alertmanagers have no direct connection to each other. Secondly, it provides an API to maintain a distributed, eventually consistent state across them, which was a differentiating feature from other system we evaluated. This allowed us to focus on defining the right application semantics to accommodate all edge cases of a distributed state.”
When each Alertmanager starts up, Weave Mesh takes care of joining it with all of the other Alertmanagers in the cluster. And when an Alertmanager emits an alert notification, or a user applies a silence to a set of firing alerts, Weave Mesh takes care of gossiping that information to all of the other Alertmanagers with best-effort semantics.
Of course, to adapt your software system to an eventually-consistent data model, you’ll have to make some changes. But luckily, Weave Mesh tries to make it as simple as it can be, and in the case of Alertmanager it was quite straightforward.
“The Alertmanager has three isolated data layers, two of which are shared via the Mesh. We made several adjustments to the silence and notification log layer. However, both changes would have been a logical improvement regardless. So actually, we didn’t have to change all that much — which is partially owed to the fact that we designed the single-node system with the future upgrade to an AP system in mind.”
With Weave Mesh, Fabian found a production-hardened way to accommodate the unique design constraints of highly-available alerting. In addition to Alertmanager, Weave Mesh also powers Weave Net, and enables it to be the only truly available and partition-tolerant peer-to-peer overlay network on the market. Maybe it can help you power your next distributed application?
To learn more about the architecture and theory of Weave Mesh, check out Effortless Eventual Consistency with Weave Mesh, from QCon London in March. And to see how Weave Mesh can be used as an arbitrary network transport, check out etcd over gossip from CoreOS Fest in May, where I made etcd properly elastic by porting it from HTTP to Weave Mesh.