The Weave Scope UI provides a neat way of visualizing the real-time state of your application as a graph of interconnected nodes. While good for answering structural questions about your system, this view falls short when trying to identify parts of it that are consuming most resources, especially on bigger topologies.
With this in mind, we recently added a separate view to Scope which focuses on resource consumption, making answers to questions like Which container, running on a particular host, is currently consuming most CPU? obvious from a single glance at the graph.
Following the idea of the new resource view complementing the existing Scope views with a different representation of the same data, we let two simple decisions guide our design:
- We are interested in showing only the current state of the system, letting Weave Cloud's Prometheus monitoring service deal with any historical resource queries.
- Interaction between the nodes is not relevant for the resource view so the connections data can be dropped.
That opened a whole world of possible layouts to choose from, so we checked out some classical choices. Since there is a natural hierarchy between most topologies that appear in Scope, it made sense to try combining multiple topologies into one hierarchical view.
Between a flat flame graph and a sunburst view, flame graph was a quick winner being easier to zoom and navigate, making a better usage of space, having more readable text tags and being cross-layer isometric. Here is what we settled on:
All the nodes are represented by their resource containers which appear in two different forms:
- Capped - the width of the resource container stands for the total resource capacity of the node, while its vertical bar represents the relative consumption of the resource. This form is used in the Hosts layer above.
- Non-capped - the resource container is always fully filled and its width stands for the absolute consumption of the selected resource. This is used in both Containers and Processes layers, as we currently don’t have any notion of their resource capacity.
Typical use cases
As is the case with the Scope graph topology views, the new resource view works well in presentation mode, that is, answering the question What does my system look like right now?. The layout is designed to help visually identify resource bottlenecks, like:
- Which container is currently most busy in terms of CPU?
- Which processes are currently consuming most memory?
- Which host has most memory?
The effect of having a multi-layered view enables us to even go one step further and answer some questions involving more than one topology:
- How much more memory is there available for a particular container on its current host?
- How many processes are currently running on a particular container?
What to expect next?
Currently we only have CPU and Memory metrics available in the resource view and even they could be made more transparent and accurate. Since each topology has its own way of estimating the resource utilization, sometimes it can be hard to guarantee data consistency between the layers. Ideally, our goal would be to converge to a single source of truth and gradually add more metrics to the view, like Disk space and File handles count.
Extending the resource view with the metrics from other topologies is also in our pipeline. The current fixed layout Hosts -> Containers -> Processes isn’t suitable for analyzing K8s clusters, so the plan is to add Pods and Services into the hierarchy and make the whole layout more customizable.
On top of live-polling that is still missing from the resource view, there are a lot more features that we plan on adding soon:
- Enabling searching and filtering like in the other two views.
- Making all the resource boxes selectable to see the nodes’ details.
- Displaying more info about the context, especially on deeper zoom levels.
- Toggling layers’ visibility and making their content sortable by different criteria.
Thank you for reading our blog. We build Weave Cloud, which is a hosted add-on to your clusters. It helps you iterate faster on microservices with continuous delivery, visualization & debugging, and Prometheus monitoring to improve observability.