visit
Like many tools in the software developer's toolbox, docker is relatively easy to jump into and takes some time to master. Using it for a variety of projects over the years I've learned a few lessons along the way.
Each
RUN
, COPY
, and ADD
command within a dockerfile produces a new disk image layer. These layers are cached to optimize rebuilding. As explained in , "the layers are stacked and each one is a delta of the changes from the previous layer."There are interesting consequences to this. Running many small commands in your build process can produce more changes and therefore larger layers. Altering any command since the last build will require docker to recreate that layer and all subsequent layers. It needs to execute all commands further down the dockerfile even if they haven't changed.After some research and much trial and error I've learned a pattern that helps my builds run faster and smaller:RUN
command to update the OS package manager and install OS dependencies. The last part should be a cleanup command to remove temporary and cached files.RUN
command to configure and enable the appropriate servers, e.g. a web server.COPY
any specialized configuration files and dependencies. During development I change these more often than standard OS dependencies.RUN
command for application runtime environments and package dependencies, such as Python's pip
. These are also combined into one command string.COPY
my application's code. That's what changes most often so save it for the end.This solved a major performance bottleneck I found with many of my builds. It's common practice to commit application package dependency information with a codebase, such as Python's
requirements.txt
file or Node's package.json
. Therefore my first instinct was to run the runtime package manager after copying my codebase into the image. Even if the required packages didn't change, the update to my codebase would force docker to trigger a re-install of all packages.Adding individual
COPY
commands for dependency manager files, such as requirements.txt
, into the image during step 3 and executing the package manager on the next step, but before the custom code is copied, means it now only runs when package requirements change.Docker's disk images are (thankfully) cached on local disk to save time during subsequent builds. During development I find myself rebuilding quite often. I'm upgrading or changing dependencies. And I'm tweaking the dockerfile itself. In addition I'll experiment with 3rd party containers for just a day or two and forget about them. Unused images will therefore pile up quickly.
While my project is up and running in my development environment I'll run the following command about once a week.> docker system prune
But my focus is in "enterprise" grade software. If you're hosting your own large SaaS application you're most likely running a container orchestration platform through a vendor. If you distribute your docker images to big companies / clients, they likely are doing the same. Don't expect docker-compose to satisfy these requirements. Plan to write and test deployment scripts and leverage the orchestration system's features to optimize scaling and uptime unique to each situation.
It's a best practice to keep your container as agnostic as possible to its environment. Use generic solutions when possible. Logs, for example, can be output to the console and redirected to a central logging system by the runtime environment outside the container. Even something as complex as authentication over the web can be handled by external systems and the necessary information passed into the container through HTTP headers. This makes local development and testing much simpler and independent of those requirements.
I learned many of these lessons while building SocialSentiment.io, an application which performs . I plan on sharing more soon.