visit
A few weeks ago I presented, Chaos Testing for Docker Containers at in London. You can find the original recording and slides at the end of this post and I’ve made some small edits to the text for readability and added some links for more context.
2. GitLab, January 31
3. AWS, February 28
4. Microsoft Azure, March 16
When we build distributed systems, we choose P (Partition Tolerance) from the and second to it either A (Availability — the most popular choice) or C (Consistency). So, we need to find a good approach for testing AP or CP systems.
Traditional testing disciplines and tools do not provide a good answer to how does your distributed system behave when unexpected stuff happens in production?. Sure, you can learn from previous failures, after the fact, and you should definitely do it. But, learning from past experience should not be the only way to prepare for the future failures.
Waiting for things to break in production is not an option. But what’s the alternative?
The alternative is to break things on purpose. And Chaos Engineering is an approach for doing just that. The idea of Chaos Engineering is to embrace the failure!
Chaos Engineering for distributed software systems was originally popularized by Netflix.Chaos Engineering defines an empirical approach to resilience testing of distributed software systems. You are testing a system by conducting chaos experiments.
Typical chaos experiment:
These are very good tools, I encourage you to use them. But when I started my new container-based project (2 years ago), it felt like these tools provided the wrong granularity for chaos I wanted to create. I wanted to create chaos not only in real cluster, but also on single developer machine, to be able to debug and tune my application. I searched Google for Chaos Monkey for Docker, but did not find anything besides some basic Bash scripts.
So, I decided to create my own tool. From day one, I’ve shared it with the community as an open source project. It’s a Chaos Monkey Warthog for Docker —
What is Pumba(a)?
Those of us who have kids or were kids in 90s should remember this character from Disney’s animated film The Lion King. In Swahili, pumbaa means “to be foolish, silly, weak-minded, careless, negligent“. I like the Swahili meaning. It matched perfectly with the tool I wanted to create.
Pumba disturbs running Docker runtime environment by injecting different failures. Pumba can kill
, stop
, remove
or pause
Docker containers.
Pumba can also do a network emulation, simulating different network failures, like: delay, packet loss (using different probability loss models), bandwidth rate limits and more. For network emulation, Pumba uses Linux kernel traffic control tc
with netem
queueing discipline, read more . If tc
is not available within target container, Pumba uses a sidekick container with tc
on-board, attaching it to the target container network.
You can pass list of containers to Pumba or just write a regular expression to select matching containers. If you do not specify containers, Pumba will try to disturb all running containers. Use --random
option, to randomly select only one target containers from a provided list. It’s also possible to define a repeatable time interval and duration parameters to better control the amount of chaos you want to create.
# Download binary from //github.com/gaia-adm/pumba/releasescurl //github.com/gaia-adm/pumba/releases/download/0.4.6/pumba_linux_amd64 --output /usr/local/bin/pumbachmod +x /usr/local/bin/pumba && pumba --help# Install with Homebrew (MacOS only)brew install pumba && pumba --help# Use Docker imagedocker run gaiaadm/pumba pumba --help
First of all, run pumba --help
to get help about available commands and options and pumba <command> --help
to get help for the specific command and sub-command.
# pumba helppumba --help# pumba kill helppumba kill --help# pumba netem delay helppumba netem delay --help
Killing randomly chosen Docker container from ^test
regex list.
# on main pane/screen, run 7 test containers that do nothingfor i in {0..7}; do docker run -d --rm --name test$i alpine tail -f /dev/null; done# run an additional container with 'skipme' namedocker run -d --rm --name skipme alpine tail -f /dev/null# run this command in another pane/screen to see running docker containerswatch docker ps -a# go back to main pane/screen and kill (once in 10s) random 'test' container, ignoring 'skipme'pumba --random --interval 10s kill re2:^test# press Ctrl-C to stop Pumba at any time
Adding a 3000ms
(+-50ms
) delay to the engress traffic for the ping
container for 20
seconds, using normal distribution model.
# run "ping" container on one screen/panedocker run -it --rm --name ping alpine ping 8.8.8.8# on second screen/pane, run pumba netem delay command, disturbing "ping" container; sidekick a "tc" helper containerpumba netem --duration 20s --tc-image gaiadocker/iproute2 delay --time 3000 jitter 50 --distribution normal ping# pumba will exit after 20s, or stop it with Ctrl-C
To demonstrate packet loss capability, we will need three screens/panes. I will use iperf
network bandwidth measurement . On the first pane, run server docker container with iperf
on-board and start there a UDP server. On the second pane, start client docker container with iperf
and send datagrams to the server container. Then, on the third pane, run pumba netem loss
command, adding a packet loss to the client container. Enjoy the chaos.
# create docker networkdocker network create -d bridge testnet# > Server Pane# run server containerdocker run -it --name server --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh"# shell inside server container: run a UDP Server listening on UDP port 5001sh$ iperf -s -u -i 1# > Client Pane# run client containerdocker run -it --name client --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh"# shell inside client container: send datagrams to the server -> see no packet losssh$ iperf -c server -u# > Server Pane# see server receives datagrams without any packet loss# > Pumba Pane# inject 20% packet loss into client container, for 1mpumba netem --duration 1m --tc-image gaiadocker/iproute2 loss --percent 20 client# > Client Pane# shell inside client container: send datagrams to the server -> see ~20% packet losssh$ iperf -c server -u
from Hope, you find this post useful. I look forward to your comments and any questions you have.
Originally published at on October 4, 2017.