Spoiler: this post is about Docker storage drivers. Not volume storage drivers, but how Docker implements the files inside every container.
Background: as part of our drive to deliver high-quality software, we maintain a set of integration tests, or “smoke-tests” as we like to call them. Weave Net’s key job is to link containers on multiple machines, so we use Vagrant to set up three small VMs and then run a barrage of tests to exercise every feature. Originally, there were just a couple of tests, and the whole thing ran through in a minute or so.
Problem: after months of sustained development the tests suite had grown until it was taking over 20 minutes to run everything.
My development machine is not too anaemic, with 12GB of RAM, an Intel Core i5 and an SSD, but while the tests were running it often ground to a halt. All in all, this state of affairs discouraged running the tests, which is really not where you want to be as a software engineer.
So, what was the bottleneck? Basic system monitoring showed that I wasn’t running out of CPU or RAM, but the disk was taking quite a pounding – regularly showing write rates of 70-80MB/sec. More detailed investigation with iotop showed it was Docker driving all this disk I/O, as it set up the filesystem when each container was started. And we were starting a lot of containers.
Now, vague memories started to surface, that Docker had different ways of approaching this. First port of call was the Docker documentation, but frankly it is clear as mud. Further web-searching turned up a Red Hat paper which looks promising, but it still seems to dive in quite deeply. After much reading, I learned that the basic idea is that Docker will use a copy-on-write technique to set up container filesystems, and there are different ways of doing this with different trade-offs.
Which storage driver are we even using? The
docker info command will tell us:
vagrant@host1:~$ sudo docker info Containers: 0 Images: 24 Storage Driver: vfs ...
We’re using vfs. Vfs? That doesn’t even appear as an option in the documentation! Luckily Jérôme Petazzoni’s excellent Not-so-deep dive into Docker storage drivers explains it as a last-ditch basic alternative that is used when nothing else works. Not so much copy-on-write as plain copy: using vfs meant that Docker did a full copy of every file in the image for every container we started. No wonder it was slow.
OK, let’s get something else working! Based on what I’d read, I picked devicemapper first. Put in the option, restart docker, and it stops immediately.
Udev sync is not supported. This will lead to unexpected behavior, data loss and errors. For more information, see https://docs.docker.com/reference/commandline/cli/#daemon-storage-driver-option
Hmm. That link is out of date, but I tracked down this issue which explains that you shouldn’t use devicemapper with the Docker binary supplied for Ubuntu by Docker. I can try building my own Docker binary, apparently, but there’s lots of choices of storage driver so let’s just move on.
apt-get install -y linux-image-extra-`uname -r`
vagrant@host1:~$ sudo docker info Containers: 0 Images: 0 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs
Note that any container images you had stored will disappear when you change the storage driver. Not a problem for our test setup, where I was rebuilding from scratch via Vagrant every time, but worth knowing if you’re keeping something important.
Moving to AUFS got my test runs down around 6-7 minutes, which was great, but is there a way to get something both fast and supported?
Enter the overlay driver. Jérôme describes overlay as “just like AUFS”, but the key difference is that overlay has been merged into the mainline Linux kernel. Chris Swan has a neat blog post showing how you can install the necessary kernel version on Ubuntu 14, but I just decided to switch to Ubuntu 15 which has the necessary 3.18 kernel in the box.
We need to add the option
-s overlay to the Docker daemon command-line, which I did by editing
/lib/systemd/system/docker.service, and we’re away:
vagrant@host2:~$ sudo docker info Containers: 0 Images: 0 Storage Driver: overlay Backing Filesystem: extfs
The test suite now runs pretty consistently around 6 minutes 30 seconds, and my machine no longer grinds to a halt under the load. Result!