This post introduces performance improvements to Weave Net, our Docker network.
We sometimes get feedback about Weave Net expressing concerns about its performance impact. Specifically, the concern is that because it uses pcap to implement an overlay network between containers in userspace, it will introduce additional network latency, constrain network throughput, and take up CPU cycles that would otherwise be available to applications. These concerns are understandable: For many applications the performance costs of Weave are not significant, but some applications really do make demands of the network that run into the limits of Weave Net today.
We have been working to address these limits, and today we are releasing a developer preview to demonstrate the results and get feedback. We call this project Weave fast datapath.
We are aiming to make the use of fast datapath unobtrusive to Weave users. There are high-performance Software Defined Networking technologies out there, but unless you specialise in network administration, you might not find them very approachable. Our goal with fast datapath is to give you performance close to that of the underlying network, without you having to do anything special to get it. That is to say: fast datapath will be available to users of Weave Net, preserving Weave’s overall simple UX for developers, and ‘container native’ packaging.
Open vSwitch is a well-known open-source SDN project. They contributed a kernel-based implementation of the Open vSwitch Datapath (ODP) to Linux, and it has been present in Linux distributions for a few years now. ODP by itself is not a full virtual network switch, but provides the packet-processing engine needed to implement one. ODP is commonly used in conjunction with the rest of the Open vSwitch suite to provide a full SDN solution, but it’s part of the kernel and you don’t need to install the rest of Open vSwitch to get it.
For Weave fast datapath, we have developed a Go library to control ODP, and modified the Weave Net router to delegate packet processing to ODP. The needs of the weave router don’t require most of the features and standards implemented by the rest of the Open vSwitch suite. Using ODP on its own allows us to take advantage of a proven technology, while keeping Weave just as easy to get started with as it always has been. And the flexibility of ODP will allow us to enhance the feature set of Weave in the future without compromising performance.
Until now, Weave has used a custom encapsulation format to wrap the packets from containers into UDP packets that are forwarded between hosts. Because fast datapath uses ODP, we are limited to use the standard encapsulation formats it supports. We have opted for vxlan, because it is supported by most kernel versions that have ODP, and it is based on UDP, so should work in any network environment where Weave already works.
This is a developer preview based on our 0.11.2 release, and lacks some features that we expect to add before fast datapath appears in a general release of Weave.
- It does not currently traverse firewalls.
- It does not automatically detect the MTU to use for the overlay network. Instead, this preview release uses a fixed conservative MTU that should work in most environments, but it can be set manually. The MTU setting can also affect performance (see below).
- We are aiming to have fast datapath work in a wide range of environments. But for those where it doesn’t, we plan to fall back to the existing Weave pcap-based datapath. The preview release doesn’t do this: If the fast datapath doesn’t work, the Weave network won’t work. But, modulo the points above, if it doesn’t work for you, we’d like to hear about it.
Another feature of Weave Net not supported by fast datapath is encryption. Our current plan is that fast datapath and encryption will be mutually exclusive, at least in initial releases. We have some ideas about combining them, but we take the security promises of Weave Net very seriously and we want to make sure we get it right. If you do want to use encryption and fast datapath together, please help us by getting in touch and telling us about your use case!
Getting the preview release
To install the preview release, you follow the normal installation instructions for weave, but use a different URL for downloading the weave script:
sudo wget -O /usr/local/bin/weave https://github.com/weaveworks/weave/releases/download/fast-datapath-preview-20150612/weave sudo chmod a+x /usr/local/bin/weave
This version of the weave script will download the docker images for the preview release.
The performance tests shown below were conducted on Amazon EC2 with enhanced networking. Two c3.8xlarge instances were used, in order to provide 10 Gigabit/sec network support (as we shall see, Weave fast datapath does not require such large instances, but EC2 only offers 10Gb/s network support for its largest instance types). Performance testing was done with the qperf network performance tester.
First, as a control, here are the results with qperf running directly on the hosts, without going through weave. The two results show TCP bandwidth and the UDP latency.
[root@ip-10-0-0-154 ~]# qperf-0.4.9/src/qperf -t 10 10.0.0.153 tcp_bw udp_lat tcp_bw: bw = 1.2 GB/sec udp_lat: latency = 48.1 us
This shows that the machines do indeed have a 10Gb/s network link between them, and are able to saturate it.
Weave was started on both hosts with weave launch -iprange 10.2.3.0/24, ubuntu containers were started with weave run -it ubuntu, and the same qperf tests were run between them:
root@7c22d20151f3:/# qperf-0.4.9/src/qperf -t 10 10.2.3.1 tcp_bw udp_lat tcp_bw: bw = 316 MB/sec udp_lat: latency = 61.4 us
The TCP bandwidth is rather disappointing! This is due to the MTU issue mentioned above. The network interfaces on the host are configured with an MTU of 9000 bytes, as is usual with 10Gb/s networks. But in this preview release, the weave network defaults to an MTU of 1410. The resulting packet rate seems to be constrained by a bottleneck inside the kernel’s network stack.
We can override the MTU of the weave network by setting the environment variable WEAVE_MTU. Vxlan encapsulation has an overhead of 50 bytes, so we set WEAVE_MTU to 8950, and relaunch weave and the containers. Then repeating the qperf tests gives a more respectable result:
root@9f2e57320ca1:/# qperf-0.4.9/src/qperf -t 10 10.2.3.1 tcp_bw udp_lat tcp_bw: bw = 1.09 GB/sec udp_lat: latency = 61.9 us
Here we can see that there is some overhead due to weave fast datapath, but the results are close to those of host networking.
Note that even with the default setting of WEAVE_MTU, the result suggests that there would be no problem saturating the more common 1 Gb/s network links.
As for CPU usage, here is mpstat on the sending host during a long-running TCP bandwidth test between containers:
[root@ip-10-0-0-154 ~]# mpstat 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-154) 12/06/15 _x86_64_(32 CPU) 18:50:50 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 18:51:00 all 0.01 0.00 0.18 0.00 0.01 0.20 0.01 0.00 99.59 18:51:10 all 0.00 0.00 0.19 0.00 0.00 0.18 0.01 0.00 99.61 18:51:20 all 0.00 0.00 0.16 0.00 0.00 0.13 0.01 0.00 99.69
And on the receiving host:
[root@ip-10-0-0-153 ~]# mpstat 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-153) 12/06/15 _x86_64_(32 CPU) 18:51:45 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 18:51:55 all 0.00 0.00 0.12 0.00 0.00 0.12 0.02 0.00 99.74 18:52:05 all 0.00 0.00 0.11 0.00 0.00 0.13 0.03 0.00 99.73 18:52:15 all 0.00 0.00 0.09 0.00 0.00 0.14 0.02 0.00 99.74