Weave ‘Gossip’ DNS
WeaveDNS was introduced in Weave 0.9 as a simple solution to service discovery on the Weave network, allowing containers to find other containers’ IP addresses by their hostnames. With Weave 1.1, we’re introducing a completely redesigned...
WeaveDNS was introduced in Weave 0.9 as a simple solution to service discovery on the Weave network, allowing containers to find other containers’ IP addresses by their hostnames. With Weave 1.1, we’re introducing a completely redesigned weaveDNS. Dubbed ‘Gossip DNS’, it is faster, better and more reliable than existing solutions.
In the original version of weaveDNS, coordination between instances was via mDNS – a UDP-based broadcast protocol. When a container was started, it registered with the local weaveDNS instance, which stored an entry in local memory. When a container wanted to query for an address, the local weaveDNS instance broadcast the query to all other weaveDNS instances, which would then answer from local memory.
This approach presented two problems:
- Firstly lookups (by far the most common operation) were significantly more expensive than registrations, involving multiple network operations. A cache was added to address this, however this also added complexity.
- Secondly, when doing a lookup in the presence of failures, there was no way to know when you had heard from all the running weaveDNS instances – hence we implemented a timeout and lived with some requests taking long than desired. The performance was even worse when doing negative queries (ie queries which have no answer); weaveDNS would have to wait for all the instances to respond (or to timeout) before returning a negative result to the client.
The new implementation reverses this design – registrations are broadcast to all weaveDNS instances, which subsequently hold all entries in memory and handle lookups locally. With Weave 1.1 the most common operation is a cheap, local in-memory lookup. To demonstrate the performance difference, we ran dnsperf against Weave 1.0 and 1.1 running on a 5-machine cluster, with each host running 10 containers. The workload was 50% positive queries, and 50% negative queries.
With Weave 1.0:
$ ./dnsperf -s $(weave docker-bridge-ip) -d test -c 1 -l 60 ... Queries per second: 4896.070494 Average Latency (s): 0.018576 (min 0.000064, max 4.663590) Latency StdDev (s): 0.081017
With Weave 1.1:
$ ./dnsperf -s $(weave docker-bridge-ip) -d test -c 1 -l 60 ... Queries per second: 11537.851621 Average Latency (s): 0.008577 (min 0.000064, max 0.101836) Latency StdDev (s): 0.003895
In addition to improved performance and latency, the new design removes the need for a traditional TTL-based cache, since updates to DNS entries are broadcast throughout the cluster. This has the property that in the absence of network failures, entries in the the local in-memory database are updated near instantaneously and no longer need to be timed out. This also significantly reduces the chance of weaveDNS giving out stale answers. These improvements have also allowed us to lower the TTL on DNS responses, encouraging clients to query weaveDNS more often and work with fresher results themselves.
The original version of weaveDNS existed as a set of separate containers on the Weave network, one per host. In addition to the new design for broadcasting registrations, the new implementation of weaveDNS has been integrated into the same process as the Weave router, in order to piggyback on the Weave router’s coordination and communication mechanisms. This has some favourable consequences: weave launch now starts one fewer container, and weaveDNS doesn’t require an address on the Weave network, therefore it doesn’t depend on our IP address allocator to have achieved a quorum before getting started.
By embedding weaveDNS in the Weave router, we take advantage of the Weave router’s knowledge of network topology to make the registration broadcast as efficient as possible. We also use the Weave router’s gossip implementation to periodically synchronise DNS mappings between peers and recover from network partitions and other transient failures. This is achieved by modelling the DNS hostname to IP address mapping as a simple CRDT.
The previous version of weaveDNS would return a single IP address for a given hostname, even when multiple containers with the given hostname existed on the Weave network. This was done to make queries as fast as possible by returning the first entry we received, and as mentioned above, in the event of failure we couldn’t know when we had received all the replies. Subsequent replies would be cached in memory, and subsequent queries would receive a single, random entry, as a simple form of load balancing.
With this new version of weaveDNS, each peer is aware of the hostnames and IP address of all containers in the Weave network. This lets us respond to queries with the IP addresses for all containers with the given hostname, enabling clients to implement simple service fail over and load balancing. We shuffle the order of the IP addresses before returning them, again as a simple form of load balancing, but getaddrinfo() stops this being as effective as you’d hope.
With all these improvements, you might expect the new weaveDNS to be bigger and more complicated – but that’s not so. Due to the reuse of existing communication mechanisms inside the Weave router and the removal of the DNS cache, the new weaveDNS implementation is less than 30% the size of the old code – 1.5k lines vs 5k. This should result in a codebase with fewer bugs, and one that is easier to maintain and improve upon.