visit
Make your containerized CI environments truly useful by accelerating your Docker buildsModern software development cycle means packaging your applications often as a container. This task can be time consuming and may slow down your testing or deployment significantly. The problem is especially obvious in the context of a continuous integration and deployment processe where images are built at every code modification.
In this article, we will discuss various ways of speeding up the build time of Docker images in a continuous integration pipeline by implementing different strategies.
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello, World!'
Writing the Dockerfile
Let’s write the corresponding Dockerfile:FROM python:3.7-alpine as builder
# install dependencies required to build python packages
RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
# setup venv and download or build dependencies
ENV VENV="/venv"
ENV PATH="${VENV}/bin:${PATH}"
COPY requirements.txt .
RUN python -m venv ${VENV} \
&& pip install --no-cache-dir -r requirements.txt
FROM python:3.7-alpine
# setup venv with dependencies from the builder stage
ENV VENV="/venv"
ENV PATH="${VENV}/bin:$PATH"
COPY --from=builder ${VENV} ${VENV}
# copy app files
WORKDIR /app
COPY app .
# run the app
EXPOSE 5000
ENV FLASK_APP="hello.py"
CMD [ "flask", "run", "--host=0.0.0.0" ]
Running and testing the image
Making sure everything is working as expected:docker build -t hello .
docker run -d --rm -p 5000:5000 hello
curl localhost:5000
Hello, World!
docker build -t hello .
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Using cache
---> 24d044c28dce
...
As you can see, this second build is much quicker as layers are cached in your local Docker service and are reused if they present no change.
Pushing the image
Let’s publish our image to an external registry and see what happens:docker tag hello my-registry/hello:1.0
docker push my-registry/hello:1.0
The push refers to repository [my-registry/hello]
8388d558f57d: Pushed
77a59788172c: Pushed
673c6888b7ef: Pushed
fdb8581dab88: Pushed
6360407af3e7: Pushed
68aa0de28940: Pushed
f04cc38c0ac2: Pushed
ace0eda3e3be: Pushed
latest: digest: sha256:d815c1694083ffa8cc379f5a52ea69e435290c9d1ae629969e82d705b7f5ea95 size: 1994
It’s important to understand that layers from our base builder image are not sent to the remote Docker registry when we push our image, only layers from the last stage are pushed. The intermediate layers are still cached in the local Docker daemon though, they can reused for your next local build command.
No problem with local build, let’s now see how it works in a CI environment.Test CI environment
We will use a CI environment leveraging:The last point is important because our CI jobs will run into a containerized environment. With that in mind, each job is spawned as a Kubernetes . Every modern CI solution use containerized job and all face the same problem when trying to build Docker containers: you need to make the Docker commands works inside a Docker container.
To make everything go smoothly you have two options:GitLab pipeline implementation
In a GitLab pipeline, you usually create utility containers like DinD by means of the .In the pipeline excerpt below, both the docker-build job and the dind service container will run in the same Kubernetes Pod. When docker is used in the job’s script, it will sends commands to the dind auxiliary container thanks to the DOCKER_HOST environment variable.stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
# localhost address is shared by both the job container and the dind container (as they share the same Pod)
# So this configuration make the dind service as our Docker daemon when running Docker commands
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker build -t hello .
- docker tag my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
docker build -t hello .
Step 1/15 : FROM python:3.7-alpine as builder
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Running in ca50f59a21f8
fetch //dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
...
Why is that? Simply because in this case dind is a temporary container that is created with the job and die after the job is done so any cached data is lost. Sadly, you cannot easily persist the data between two pipeline launches.
How we can benefit from the cache and still be running a dind container?One solution: Pull/Push dancing
The first solution is rather straightforward: we will use our remote registry (the one we push into) as a remote cache for our layers.More precisely:stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker pull my-registry/hello:latest || true
- docker build --cache-from my-registry/hello:latest -t hello:latest .
- docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker tag hello:latest my-registry/hello:latest
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:latest
To sum it up, there is only a modest cache use with 2 steps out of 15 benefiting from the cache! To improve it, we need to push the intermediary builder image to the remote registry to persist its layers:
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker pull my-registry/hello-builder:latest || true
- docker pull my-registry/hello:latest || true
- docker build --cache-from my-registry/hello-builder:latest --target builder -t hello-builder:latest .
- docker build --cache-from my-registry/hello:latest --cache-from my-registry/hello-builder:latest -t hello:latest .
- docker tag hello-builder:latest my-registry/hello-builder:latest
- docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker tag hello:latest my-registry/hello:latest
- docker push my-registry/hello-builder:latest
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:latest
We build our builder intermediary stage as a proper docker image using thetarget option. After that, we push it to the remote registry, eventually pulling it as a cache for building our final image. When running the pipeline, our time is down to 15 seconds!
You can see the build is slowly becoming quite complicated. If you are lost, just think about an image with 3 or 4 intermediary stages! It does work though. Another drawback is that you have to upload and download all these layers each time which may be quite expensive in storage and transfer costs.Another solution: external dind service
We need to have a dind service running to execute our docker build. In our previous try, dind is embedded into each job and share the lifecycle of the job making it impossible to build a proper cache.Why not make dind a first class citizen by creating a dind service in our Kubernetes cluster? It would run with a PersistentVolume attached to handle the cached data and every jobs could send their docker commands to this shared service.Creating such a service in Kubernetes is easy:apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: docker-dind
name: dind
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: docker-dind
name: dind
spec:
replicas: 1
selector:
matchLabels:
app: docker-dind
template:
metadata:
labels:
app: docker-dind
spec:
containers:
- image: docker:19.03-dind
name: docker-dind
env:
- name: DOCKER_HOST
value: tcp://0.0.0.0:2375
- name: DOCKER_TLS_CERTDIR
value: ""
volumeMounts:
- name: dind-data
mountPath: /var/lib/docker/
ports:
- name: daemon-port
containerPort: 2375
protocol: TCP
securityContext:
privileged: true #Required for dind container to work.
volumes:
- name: dind-data
persistentVolumeClaim:
claimName: dind
---
apiVersion: v1
kind: Service
metadata:
labels:
app: docker-dind
name: dind
spec:
ports:
- port: 2375
protocol: TCP
targetPort: 2375
selector:
app: docker-dind
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
# here the dind hostname is resolved as the Kubernetes dind service by the kube dns
DOCKER_HOST: "tcp://dind:2375"
docker-build:
image: docker:stable
stage: build
script:
- docker build -t hello .
- docker tag hello:latest my-registry/hello:{CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:{CI_COMMIT_SHORT_SHA}
If you run the pipeline twice, the second time the build should be 10 seconds, even better than our previous solution. For a “big” image taking around 10 minutes to build, this strategy also reduce the build time to a few seconds if no layers have changed.
One last option: using Kaniko
A final option may be to use . With it, you can build Docker images without the need of a Docker daemon, making everything we saw a non-problem.However, please note that doing so you cannot use advanced options like for example injecting secrets when building your image. For this reason, it’s not the solution I retained.Read behind a paywall at