Testing Distributed Systems with Weave Scope’s Traffic Control Plugin
In this guest post please welcome the Kinvolk team who have recently developed a Weave Scope plugin called Traffic Control. The Traffic Control plugin can help you troubleshoot any network traffic slowdowns and bottlenecks that can occur...
In this guest post please welcome the Kinvolk team who have recently developed a Weave Scope plugin called Traffic Control. The Traffic Control plugin can help you troubleshoot any network traffic slowdowns and bottlenecks that can occur in distributed systems.
Kinvolk describe how they’ve implemented the Weave Scope Traffic Control plugin and where and how to use this plugin to test for latency and other traffic control issues.
Do you know how your service responds to degraded network quality? And do you test for this? In this post we describe how to test various network quality scenarios using Weave Scope and the Traffic Control plugin.
Companies invest a lot of time and resources to ensure their services are scalable, fast, and highly available and they provide their developers and engineers with a bevy of tools (monitoring, CD/CI, etc.) and processes to meet these goals. In this blog, we’re going to make the case for adding the testing of various network quality scenarios to this list and show you how to easily do it using Weave Scope and its new Traffic Control plugin.
Do you know how your individual services respond to degraded network quality? Do they hang, crash, or just make the situation worse by flooding the network?
With the advent of microservices infrastructure, the quality of the network has become even more important. But rarely do we test what happens when the connections are of poor quality. Instead, we tend to assume the best and hope we can deal with issues when they arise. But there are ways to test how your system reacts to various networking issues, one of which we’ll look at below.
The Last Mile
Do you know what your user’s experience is when they use your services under poor network scenarios? Do you test for this?
The “last mile” between the users and your service is often the least tested scenario, but the one that is often of the poorest quality. And it is also one that can, unfortunately, not be mitigated. Even if the typical user has a high speed connection “everywhere”, there are myriad scenarios that can cause the connection to be suboptimal (high latency and packet loss): rural & mountainous locations, travelling by public transport or car, being indoors or around buildings while on a mobile connection, etc.
Weave Scope Plugins
If you are not yet familiar with Weave Scope, it’s a tool that allows you to visualize and interact with your distributed apps and containers in real-time. But Weave Scope also contains a plugin API that, until recently, only allowed plugins that could send custom metrics to the Weave Scope interface; so the information only went one way.
But now Weave Scope plugins have the capability to generate interactive controls. The first plugin to take advantage of this is a traffic control plugin and it helps you test the scenarios that we presented above.
Traffic Control Plugin
The Weave Scope traffic control plugin manipulates different aspects of the container’s network parameters, such as latency, packet loss, etc. The plugin consists of a simple daemon that runs on each cluster node and responds to API calls. The plugin manipulates the traffic control settings of the pods using the the Linux traffic control tool, tc.
The traffic control plugin is decoupled from the service being tested. This means you can stop and restart the daemon on a pod without affecting its connectivity.
The figure below represents a Kubernetes deployment and shows how the setup works. The controls on the Scope user interface are generated from the plugin. Previously, placing controls in the Scope app required making modifications to Scope itself.
Plugin UI controls in Scope. Testing with the Sock Shop demo
Let’s Buy Some Socks!!!
This video will walk us through the process of buying some “Holy” socks. It goes through the process twice; once with a typical developer’s connection, and then again with a 2000ms latency applied to the front-end service.
As expected, during the first go everything works smoothly and the user experience is great. But when we increase the front-end latency, we immediately notice some UX problems, and the cart is empty. Without any feedback (e.g. a “Loading” message), the user may understandably be confused and try to go back, add more socks, refresh the cart (which results only in more waiting), or, out of frustration, leave the shop altogether.
With some simple tweaking with the traffic control plugin, we’ve been able to detect a UX issue that can now be addressed.
A Quick Fix
A simple way to address this is to implement a “Loading” message or some sort of progress bar and clear the prices till the response comes through. Here is a video of the simple change in action.
As you can see, the user receives much better feedback in that something is in fact happening and a bit of patience is the answer.
The change found by simulating a poor connection with the Weave Scope traffic control plugin probably costs a lot less than dealing with frustrated users.
An Exercise for the Reader
Applying the latency to the front-end simulated that “last mile” mentioned above. We encourage you to run the Sock Shop demo and apply the traffic control parameters on the connections between the individual services. Or better yet, try it out on your own system. Instructions to do just that can be found on the traffic control plugin project page.
Scope and container detailed view:
Container detail view in Scope
Menu bar set latency
Menu bar item displaying latency
We hope to have demonstrated that with the improvements in the Weave Scope plugin system some pretty powerful interactions are possible for your Docker and Kubernetes deployments. In the case of the traffic control plugin, we’ve turned shaping your network traffic into something as easy as navigating Weave Scope itself.
This post was written and the code implemented by the Kinvolk team for Weaveworks.
Reach out to the authors on Twitter at:
Alessandro Puccetti: @alepuccetti
Alban Crequy: @albcr
Chris Kühl: @blixtra