visit
The first symptoms we saw were test timeouts. We have fairly short timeouts on many of our end-to-end tests so we can detect if new code suddenly makes things take longer for end users. But now, a test that should normally take a half second would sometimes take 5.5 seconds.
The additional 5 seconds was a great clue--5 seconds sounded like it could be a timeout of some kind. Armed with that hunch, we looked back through all the seemingly random test failures and found the common thread: they were all tests that initiated network requests. We also noticed a few tests that had taken even longer to fail...always in increments of 5 seconds.There weren't too many network protocols that could be involved here, so some quick Googling pointed us in the right direction. The default timeout for DNS server queries on Linux just .To see what was going on with DNS, we reached for probably the single most important tool for debugging network issues on Linux: . (Or, if you prefer a GUI version, is great too.) We ran tcpdump on the host system (an Amazon Workspaces Linux instance) and used a filter to see the DNS traffic:$ tcpdump -n -i eth1 port 53
11:35:59.474735 IP 172.16.0.131.54264 > 172.16.0.119.domain: 64859+ AAAA? registry-1.docker.io. (38)
11:35:59.474854 IP 172.16.0.131.49631 > 172.16.0.119.domain: 43524+ A? registry-1.docker.io. (38)
11:35:59.476871 IP 172.16.0.119.domain > 172.16.0.131.49631: 43524 8/0/1 A 34.197.189.129, A 34.199.40.84, A 34.199.77.19, A 34.201.196.144, A 34.228.211.243, A 34.232.31.24, A 52.2.186.244, A 52.55.198.220 (177)
11:35:59.476957 IP 172.16.0.119.domain > 172.16.0.131.54264: 64859 0/1/1 (133)
Here’s the
dns_cache
script that starts the DNS cache container:#!/usr/bin/env bash
: "${IMAGE:=andyshinn/dnsmasq:2.76}"
: "${NAME:=dnsmasq}"
: "${ADAPT_DNS_IP_FILE:=/tmp/adapt_dns_ip}"
# Get IP address for an interface, as visible from inside a container
# connected to the host network
interfaceIP() {
# Run a container and get ifconfig output from inside
# We need the ifconfig that will be visible from inside the dnsmaq
# container
docker run --rm --net=host busybox ifconfig "$1" 2>/dev/null | \
awk '/inet /{print(gensub(/^.*inet (addr:)?([0-9.]+)\s.*$/, "\\2", 1))}'
}
if docker inspect --type container "${NAME}" >& /dev/null ; then
if [ -f "${ADAPT_DNS_IP_FILE}" ]; then
# dnsmasq is already started
cat "${ADAPT_DNS_IP_FILE}"
exit 0
else
echo DNS cache container running but file ${ADAPT_DNS_IP_FILE} does not exist. >&2
exit 1
fi
fi
# We only support attaching to the default (host) bridge named "bridge".
DOCKER_HOST_NETWORK=bridge
# Confirm that "bridge" is the default bridge
IS_DEFAULT=$(docker network inspect "${DOCKER_HOST_NETWORK}" --format '{{(index .Options "com.docker.network.bridge.default_bridge")}}')
if [ "${IS_DEFAULT}" != "true" ]; then
echo Cannot start DNS cache. The Docker network named \"${DOCKER_HOST_NETWORK}\" does not exist or is not the default bridge. >&2
exit 1
fi
# Get the Linux interface name for the bridge, typically "docker0"
INTF_NAME=$(docker network inspect "${DOCKER_HOST_NETWORK}" --format '{{(index .Options "com.docker.network.bridge.name")}}')
if [ -z "${INTF_NAME}" ]; then
echo Cannot start DNS cache. Unable to determine default bridge interface name. >&2
exit 1
fi
# Get the IP address of the bridge interface. This is the address that
# dnsmasq will listen on and other containers will send DNS requests to.
IP_ADDR=$(interfaceIP "${INTF_NAME}")
if [ -z "${IP_ADDR}" ]; then
echo Cannot start DNS cache. Docker bridge interface ${INTF_NAME} does not exist. >&2
exit 1
fi
# Run the dnsmasq container. The hosts's /etc/resolv.conf configuration will
# be used by dnsmasq to resolve requests.
docker run --rm -d --cap-add=NET_ADMIN --name "${NAME}" --net=host -v/etc/resolv.conf:/etc/resolv.conf "${IMAGE}" --bind-interfaces --listen-address="${IP_ADDR}" --log-facility=- > /dev/null
if [ $? -ne 0 ]; then
echo Cannot start DNS cache. Docker run failed.
exit 1
fi
# Remember what IP address to use as DNS server, then output it.
echo ${IP_ADDR} > "${ADAPT_DNS_IP_FILE}"
echo ${IP_ADDR}
Here’s an example of how to start the DNS cache, remembering the IP address in variable
DNS_IP
and then running another container that will use the cache.$ DNS_IP=$(dns_cache)
$ docker run --dns ${DNS_IP} --rm busybox ping -c1 adaptjs.org
After we started using the cache in our testing, the number of DNS queries that the host system sent to the AWS DNS server dropped to a small trickle. We also confirmed that the cache was operating properly by checking the dnsmasq statistics. Sending a
SIGUSR1
to dnsmasq causes it to :$ docker kill -s USR1 dnsmasq
$ docker logs dnsmasq
dnsmasq[1]: cache size 150, 1085/4664 cache insertions re-used unexpired cache entries.
dnsmasq[1]: queries forwarded 1712, queries answered locally 3940
dnsmasq[1]: queries for authoritative zones 0
dnsmasq[1]: server 172.16.0.119#53: queries sent 1172, retried or failed 0
dnsmasq[1]: server 172.16.1.65#53: queries sent 252, retried or failed 0
dnsmasq[1]: server 172.16.0.2#53: queries sent 608, retried or failed 0
Originally published on the .