As a cloud monitoring tool, Weave Scope has to collect data about processes and network connections on each cluster node, process that data, and compile everything into a report that is sent to the app server as data points for the user interface. All this must happen reliably, frequently and quickly. In this post, we’ll look at how we’re working to improve on all three of these in Weave Scope by leveraging eBPF.
Today, the default means by which Weave Scope gathers data is to regularly check system directories in /proc and use a tool called conntrack. In Scope, the program that manages this collection and reporting is called an agent and runs in a container on each node. The agent’s job is to gather data and send reports at fixed intervals. The agent incrementally gathers system data and attempts to have the total required system data gathered every 10s.
This works and is reasonably efficient. But there is room for improvement in a couple areas.
Firstly, traversing and reading the /proc directory for this information requires lots of parsing and correlating data. For example, to retrieve the connections tuples for a process, we need to get the file descriptor information for each process from /proc//fd, then get connection information from /proc//net/tcp. Scope needs to do this often.
Secondly, it’s not rare for a process or connection to come and go within a polling window. To deal with short-lived connections, Scope uses conntrack. conntrack is a program that listens for network connection events; UPDATE (established), DESTROY (closed), etc. Scope runs conntrack as a subprocess and listens on its stdout. But conntrack has problems that we’ll go into later.
So we have 2 methods for getting virtually the same information. Not ideal.
Thus, we wanted to find alternative means of gathering information that addresses the issues above, when possible. This search lead us to eBPF and the work happening around the BPF Compiler Collection (BCC). After closer inspection we determined that while gathering data from the /proc directory is still required for process information such as cpu and memory consumption, eBPF could be helpful in gathering connection information from a single source.
Before diving into how Scope does this and how to enable it, let’s quickly explain what eBPF is.
BPF (without the ‘e’) is a small “bytecode virtual machine” in the Linux kernel. The kernel attaches BPF programs to kernel objects according to program type and configuration. This can be used, for instance, to attach a small BPF program to a socket. The kernel will execute that program for every network packet the socket receives. While the sole use case for BPF used to be filtering network traffic, it has much broader capabilities today thanks to enhancements (the ‘e’ in eBPF) that have been added to attach it to kernel objects other than sockets; system calls, kprobes, etc.
One of the most powerful features added by eBPF is the ability for eBPF programs loaded into the kernel to communicate with userspace. This communication happens via maps. Maps can store arbitrary defined data structures. These maps can be read from, and written to, by both userspace and the eBPF programs in the kernel.
eBPF in Scope
Scope makes use of one eBPF enhancement in particular; the ability to attach an eBPF program to kprobes (added in Linux 4.1).
kprobes are breakpoints which can be set in the kernel to collect “performance information non-disruptively”. They make it possible to execute an eBPF program whenever a kernel function is called, in Scope’s case to trace TCP functions.
By attaching eBPF programs to a few network kernel functions (tcp_v4_connect, inet_csk_accept, tcp_close), Scope is able to track connect, accept and close events for TCP connections. Every time one of these functions is entered and/or returns the kernel executes the eBPF program and allows the Scope tcptracer module to gather all relevant data.
Let’s look at how doing this improves scope.
To start, gathering the necessary information from different files in procfs can not be done atomically. This is racy by definition. And, as we mentioned above, polling this causes us to miss short-lived connections.
With conntrack we get connection events in an event-driven way, solving the issue of gathering short-lived connections. However, we ran into a number of limitations and issues with conntrack.
Firstly, the default size of conntrack’s event backlog buffer is often too small, which results in conntrack reporting errors, missing events, and even crashing. One can increase the size of conntrack’s buffer (Scope has an option to do this – -probe.conntrack.buffersize) which is suggested for systems with lots of connections. The eBPF implementation, on the other hand, uses a ring buffer which has a couple advantages; it’s far faster and has shown to be more reliable.
Secondly, and most problematic, conntrack doesn’t provide the information we need; namely, the connection’s network namespace and PID. This is why we still need to traverse the /proc directory to get the needed information to associate connections with PIDs when using conntrack.
By using eBPF to track connections, both issues with conntrack and the raciness of /proc traversal are avoided.
With eBPF, the connection information is collected upon each connection event. And because we can define the structures the eBPF program provides to user-space, we’ve included all the needed information. This avoids the need to go into /proc to associate connections with PIDs and, in turn, avoiding the raciness involved.
Improved update frequency
By removing the need to repeatedly run through /proc for connections, and instead having that information pushed to the agent, with the associated network namespace and PID, the agent no longer has to wait the 10s (on average) to collect and correlate the data it needs about connection info.
This has a direct effect of the UI’s responsiveness. The UI updates every 3 seconds. Without eBPF, the UI could go 3 update cycles without having the needed data to associate a container to the connection. With eBPF this delay is no longer present, making the UI more responsive to a changing network.
The following screenshots illustrates this. In the first image, we see 9 containers that have been detected, but without their associated connection information. This should no longer happen with eBPF enabled as all the needed data is delivered together. The second image is with eBPF enabled.
eBPF connection tracking has shown to reliably improve the agent performance by 5-10% when a large number of processes and connections are active. The following graph shows the results of a test run of 50 containers with 10 connections each.
The test was run for 10 minutes on an AWS m4.xlarge instance.
Enabling eBPF probes in Scope
Now that we’ve learned the ‘why’, of using eBPF, let’s get to the ‘how’.
Connection tracking with eBPF is not yet the default and has to be enabled. This can be done by starting Scope with the
--probe.ebpf.connections=true parameter, e.g.
./scope launch --probe.ebpf.connections=true
Enabling eBPF should work on Linux version 4.4 and later when compiled with BPF features enabled (
CONFIG_BPF_SYSCALL=y, etc.). We continuously test the tcptracer-bpf module on Linux 4.4 and 4.9, the latest longterm and stable releases at the time of writing.
To verify eBPF is indeed used to track connections, make sure to check the logs of the weavescope container. You should not see any failure like
WARN: 2017/03/14 16:26:28.869220 Error setting up the eBPF tracker, falling back to proc scanning: cannot open kprobe_events: open /sys/kernel/debug/tracing/kprobe_events: no such file or directory
The road less taken
While the BPF mechanism has been in the kernel for a long time, the enhancements that Scope uses are relatively new and, thus, convenient tools and lessons-learned are still very much in the making.
We went through a number of iterations to get the point we’re at today. The slides from Alban’s talk last month provide an account of these iterations (starts at slide 22) and challenges and limitations of each approach. But in the end we came to an approach that has almost no adverse effects on container image size, does not require kernel headers, and runs on an overwhelming portion of the kernel versions Scope targets.
A few external tool came out of this journey.
Scope is written in Go and we wanted to be able to use eBPF from Go, as well. After some searching and discussions with the IO Visor project, we started the gobpf project. gobpf allows for using the high-level BCC tools from Go as well as load eBPF programs directly from specially crafted elf files.
tcptracer-bpc is a BPF program that Scope uses to actually get the connection information. It made sense to break this out because the need to gather tcp connection information is not specific to Scope. It uses gobpf and the elf-file-loading mechanism mentioned above.
As mentioned, this feature is not yet turned on by default. There are a few things that need to happen before we do that.
Firstly, we need to make sure that when Scope is running on kernel that doesn’t support eBPF, we fall back to the current default. Scope has always been about just working. Luckily, work to make this happen is already underway.
Secondly, and most importantly, we want to encourage you, the user, to test it out and give us feedback about how it works for you. So, please head over to https://github.com/weaveworks/scope, clone, build and run
./scope launch --probe.ebpf.connections=true. Your input is greatly appreciated.