Latest from the blog

October 06, 2020

Living on the Edge - How Screenly Monitors Edge IoT Devices with Prometheus

In this guest post Viktor Petersson discusses how Screenly uses Prometheus to monitor their infrastructure. Over the years, they found Prometheus to be extremely versatile, and then expanded Prometheus to include business intelligence metrics.

September 12, 2019

How I Halved the Storage of Cortex

Find out how Bryan Boreham managed to cut the storage of Cortex’ time -series data in half by re-architecting how the data gets split into chunks.

March 21, 2019

Going Cloud Native: 6 essential things you need to know

Are you just starting on your digital transformation journey and still wondering what cloud native is, and why you need it? This new article discusses the key takeaways to know about the term cloud native.

February 21, 2019

How Does Prometheus Scale?

Cortex is an open source multi-tenant, horizontally scalable version of Prometheus. Bryan Boreham discusses how and why we built Cortex.

February 06, 2019

How Aspen Mesh Runs Cortex in Production

Neeraj Poddar Lead Platform Architect at Aspen Mesh gives us insights and tips on why and how they implemented Cortex in production.

January 10, 2019

Throwback Thursday: Configure notifications in Prometheus’ Alertmanager

Monitoring is crucial for any developer and on call engineer. But sorting through multiple notifications or missing critical information for a quick resolution is no fun. This blog post recaps a KubeCon lightning talk that demonstrates using labels in Prometheus alertmanager to only receive concise and easy to understand notifications.

October 04, 2018

Five Key Cloud Technologies for Kubernetes

Five key open source projects that help complete the Kubernetes feature set are discussed in this post: Prometheus, Istio, Helm, Weave Flux, and OpenFaaS.

September 19, 2018

Weaveworks Cortex - the newest member of the CNCF Sandbox

Cortex, our scalable Prometheus monitoring system, has been accepted as a Sandbox project by the Cloud Native Computing Foundation. Cortex is an open source project that we created to provide storage of Prometheus metrics for Weave Cloud. It's used by teams that are running large Prometheus environments, providing metrics for complex Kubernetes environments.

August 28, 2018

Labels in Prometheus alerts: think twice before using them

As developers, we hear a lot about the importance of monitoring and alerts. But without proper notification, we might spend too much time trying to understand what really is going on. This blog post will give you an overview of common caveats of using labels in Prometheus alerts and demonstrate some technics how to get concise and easy to understand notifications.

June 27, 2018

GitOps - What You Need To Know

Learn the principles and patterns of GitOps workflows and how to implement them to run Kubernetes in production and at scale. We added new content to our Kubernetes library, and summarized the key concepts of GitOps all in one place.

June 20, 2018

GitOps, Weave Cloud and EKS demonstrated at EKoSystem Day

Craig Wright demonstrated GitOps workflows and Weave Cloud on EKS at the EKoSystem Day event held at the AWS Loft in downtown San Francisco. Weaveworks was one of 10 technical partners invited to speak at this special event that was broadcasted live on Twitch.

March 01, 2018

Ensure High Availability and Uptime With Kubernetes HPA (Horizontal Pod Autoscaler) and Prometheus

Not all systems can meet their SLAs by relying on CPU/memory usage metrics alone, most web and mobile backends require autoscaling based on requests per second to handle any traffic bursts. This step by step guide shows you how to set up Kubernetes Horizontal Pod Autoscaler with Prometheus defined custom metrics, to fine tune your application monitoring and ensure high availability.

February 08, 2018

Monitoring Cloud-Native Applications

Understand the importance of monitoring your microservices and infrastructure, and how to turn those metrics into meaningful data when looking to improve performance or mitigate arising problems. Discover the different methodologies, metrics and approaches to effectively monitor microservices and the recommended tools to help you.

February 02, 2018

Architecture Overview: Cluster Monitoring at Scale on AWS

Watch this short architecture overview video to learn how Weaveworks monitors clusters at scale using a highly available, multi-tenant system built on AWS services.

October 30, 2017

A Practical Guide: From Instrumenting Code to Specifying Alerts with the RED Method

This practical guide will help you getting started with monitoring your microservices with Prometheus. We walk through selecting key metrics, instrumentation, setting up alerts and Grafana dashboards.

October 17, 2017

GitOps Part 3 - Observability

Observability can be seen as part of the Continuous Delivery cycle for Kubernetes. Observed state must be compared with the desired state in Git.  The role of a GitOps dashboard is to enable observation and speed up understanding and validation of the system, and suggest mitigating actions. Monitoring alone does not answer all questions: metrics are symptoms but not the disease.

October 15, 2017

Swarmprom - Prometheus Monitoring for Docker Swarm

In this post we will be discussing how to set up application and infrastructure monitoring for Docker Swarm with the help of Prometheus. Swarmprom is a starter kit for Docker Swarm monitoring with Prometheus, Grafana, cAdvisor, Node Exporter, Alert Manager, and Unsee.

August 31, 2017

Monitoring, alert rules and loads of beers errr dashboards - PromCon 2017 in Munich

The Weaveworks team visited PromCon 2017 in Munich. One of our personal highlights was a deep dive into the past 12 months of Cortex, the basis of our monitoring and analytics capabilities in Weave Cloud.

August 14, 2017

Observability beyond logging for Java Microservices

Monitoring distributed applications is best approached using a combination of tools. Luke Marsden describes how Prometheus, openTracing and Weave Cloud visualization cover the bases to establish root cause of problems in distributed applications.

August 10, 2017

User-centric Alerting

If something goes wrong in production, you want to immediately know the user impact. With that in mind, we created an automated alerting schema based on user-visible symptoms.

Next