Docker networking 1.9 and Weave technical deep-dive
This is a technical follow up to the previous post, Life and Docker Networking – One year On. Please note that our goal here, shared with Docker, is to make the customer experience as straightforward as possible. Do get in touch if you...
This is a technical follow up to the previous post, Life and Docker Networking – One year On.
Please note that our goal here, shared with Docker, is to make the customer experience as straightforward as possible. Do get in touch if you have questions.
This section provides more technical depth to our previous blog post. The information is aimed at people who want to understand how things work in Docker and Weave. At the time of writing, we did not yet have access to the final release documentation for Docker 1.9, so if we have made mistakes in this post, please contact us and we shall correct them. Note that a full list of documented Weave features is here.
Partition tolerance, Performance and Simplicity
Docker Networking in 1.9 requires an external Key-Value (KV) store, which is used for global IP address allocation and node discovery. The store is accessed via an API, libkv, that is advertised as working with Consul, etcd, and ZooKeeper.
These KV stores are all fine pieces of technology, and we work with them well. That said, we did not use an external consensus-based KV store as Weave’s back end, and instead went with a CRDT (Conflict-free Replicated Data Type) approach. This is a good time to talk about why we made that decision.
We’ll focus on Consul, because the examples in the Docker 1.9 developer docs specify it. Per those instructions, a Consul server is installed on every Docker host; each respective Docker daemon is then configured to use its local server via the
Issues to be aware of with distributed key-value stores
- Installing an additional distributed system in order to set up networking, for each Docker cluster, may not be to everyone’s taste.
- Consul, like etcd and ZooKeeper, provides strong consistency via a majority quorum: in the event of a network partition or node failure, reads and writes will be blocked on inquorate nodes. Critical Docker Networking functions such as creating a new overlay network, allocating a global IP address, running a new container or joining an existing container to an overlay network rely on a quorate KV store. If quorum is not reached, these operations will simply fail. In practice this means that users may, for example, be unable to launch containers during a transient network partition.
- Installing a Consul server on every Docker host will work for up to 5 machines, but it is not recommended to go beyond that number of Consul servers in a cluster – this is also true of etcd and ZooKeeper. In a production deployment consisting of tens or hundreds of hosts, you will be responsible for managing the capacity relationship between Docker instances and your chosen KV store cluster.
- The failure modes of the network will be tied to the failure modes of Consul and how it is set up. Other stores will have other failure cases. You will need understand all this to run a production system.
- Root cause analysis is harder: Figuring out whether bugs are in the application, or in Docker networking, or the external KV store, is up to you.
In contrast, Weave does not require you to install any additional software or to think about stores, configurations, failures and so on. We think this leads to an easier initial user experience, for the use cases that Weave is targeting.
Weave uses necessary consistency and not more
Weave uses eventual consistency to manage network configuration, the same approach as other large scale systems like the Internet.
- In the event of a network partition, writes and reads are available on any node that is up.
- There are also performance benefits as the system scales beyond, say, 10 hosts.
The technique is “data centric” (CRDTs) instead of “algorithm centric” (Raft, Paxos). We highly recommend Bryan’s talk about this here (video, slides), which also provides a summary of Weave’s performance at scale.
In the CAP Theorem sense, this is “AP” (Available and Partition tolerant). Weave is also smart enough to use “CP” (Consistent and Partition-tolerant) when it needs it, such as during bootstrap. In other words, Weave does not require consensus except once during startup, and, in most cases operations such as IP address allocation are entirely local operations.
Partially connected networks
Weave Net works in partially connected networks, and other adverse environments featuring NAT and reduced MTUs. Docker Networking on the other hand requires full reachability between all hosts and performs no dynamic PMTU discovery.
Weave Net provides optional traffic confidentiality and authenticity – this is used in production by, for example, Tutum (now part of Docker – congratulations Borja and Fernando!). Without this feature, and given the necessity for every Docker host to access the same KV store, Docker Networking is best suited to right-sized clusters in trusted zones (eg one VPC). More on this below (“cross cloud”).
Weave Net enables multicast anywhere you can run Weave, a feature not available with the Docker Networking overlay – this is really handy for cloud deployments where multicast is normally unavailable, but highly desirable as an enabler of automatic configuration.
Cross cloud connections
Both Docker and Weave can be used in multiple locations at once. Weave makes this pretty simple out of the box. In comparison, when using Docker’s built-in networking, you will need to think about more issues, listed here. If you use Docker Networking in 1.9, then cross site connections are expected to work as long as:
- Relevant Consul/etcd/ZooKeeper, Serf & vxlan ports are open
- A consistent KV store is visible from all Docker hosts. In practice however this either means a) having the KV cluster in one of the datacentres, introducing a SPOF or b) spreading the KV cluster across multiple datacentres, greatly increasing the likelihood of transient partitions and hence quorum failure (with results as above)
- Docker hosts have full mesh IP reachability amongst themselves. In a cross cloud scenario, this means that every Docker host must have a public IP address. NAT and forwarding via intermediary hosts are not supported.
- Your cross site communications are secure.
We’ll finish by detailing these. In this area, Weave Net and Docker share a very strong philosophy of bundling features so that developers have an easy time. As of Docker 1.9, there are still a few things to be aware of in terms of implementation choice.
Service Discovery and Load Balancing
Docker Networking currently implements service discovery by rewriting the
/etc/hosts file in every container on every host each time a container joins or leaves the overlay network. This has led to frenetic discussion and pointed commentary from industry observers, and is probably best viewed as a stop gap measure until a DNS based solution can be implemented.
On the other hand, Weave Net already implements a micro DNS server on each router – you simply name containers and everything ‘just works’, including load balancing across multiple containers with the same name.
See Bryan’s talk (video, slides) to see how we have achieved this ease of use whilst providing a scalable, high performance and partition tolerant implementation.
IPAM is used to automate IP address allocation, ensuring that containers get unique addresses in a network and thus relieving developers from managing this themselves. Weave supports it out of the box, without third party software being required.
Docker Networking use libkv for multi-host IPAM, so is vulnerable to partitions as discussed above; in contrast, Weave’s IPAM is built on the same high performance partition tolerant technology that powers the Weave Net DNS server.
A full list of documented Weave features is here.