As your team invests significant time and resources developing models, it is imperative that processes are put into place to protect and maximize the return on that investment. To that end, in this installment of the ModelOps Blog Series we’ll discuss leveraging functionality provided by continuous integration/continuous deployment (CI/CD) frameworks such as Jenkins, CircleCI, and GitHub Actions to automate the push of model container images to production container registries.As your team develops and containerizes models, it’s important that they don’t just live on your R&D servers or model developers’ laptops where events like hardware failures or accidental reformats could wipe away capabilities in the blink of an eye. In addition, using a CI/CD pipeline to deploy your models to container registries allows you to do the following in an automated fashion every time you want to release a new version of a model:
- Test the model’s functionality and scan for security issues
- Store and control access to the model image in a persistent, secure, organized, and scalable fashion
- Trace the model image back to its original source code
If configured correctly, this type of automation minimizes the amount of labor required and mitigates the risk of human error through the model deployment process. The starting point for the image push process is a model container image successfully built by a CI/CD server. Make sure you are up to speed on what it takes to produce a model by responsibly sourcing data, following best practices for model training and versioning, and automating model container builds using CI/CD frameworks by checking out the previous posts in this series.
Leveraging container registries
Containerization is important to ensuring models function properly once they are deployed into production. Containerizing models ensures that they will execute in the same way regardless of infrastructure.
- A container is a running software application comprised of the minimum requirements necessary to run the application. This includes an operating system, application source code, system dependencies, programming language libraries, and runtime.
- A container repository is a collection of container images with the same name, but with different tags.
- A container registry is a collection of container repositories.
When working with containerized model images, the container registry might be a collection of numerous container repositories, with each repository corresponding to a particular model. Each of these repositories might contain multiple images corresponding with multiple versions of the model tagged accordingly. There are numerous options as far as container registries go, including Amazon Web Services (AWS) Elastic Container Registry (ECR), Microsoft Azure Container Registry, and Google Container Registry. Automating the deployment of model container images to these container registries using CI/CD yields a number of benefits. Container registries allow you to easily store, secure, and manage model images.By automating the deployment of container images, you can run unit tests to ensure correct model functionality or detect issues early in the deployment process; this includes scanning model images for potential security vulnerabilities. Additionally, if model images are deployed in an automated fashion using CI/CD, tagged model images within repositories within container registries can be traced back to their original source code.
Pushing models to registries using CI/CD
In the previous blog post in this series, we discussed using CI/CD frameworks such as Jenkins, CircleCI, and GitHub actions to automate the building, scanning, and testing of model container images. These CI/CD frameworks also offer support for automating the tagging and pushing of model container images to container registries. For some CI/CD frameworks and container registries, there is built-in compatibility, but for others, additional plugins/configurations are required to successfully automate the push process. Although the process differs in certain ways, container image pushes can be automated using most combinations of popular CI/CD frameworks and container registries.The Modzy data science team implements a similar process for the models we develop, relying on Github for version control throughout the model development and containerization processes. Every time code is merged to a model repository’s master branch, CircleCI builds, scans, tests, tags, and pushes a new container image to an AWS ECR registry. In this way, bugs or vulnerabilities can be detected prior to the push of the image to the registry, and each image in each repository within the registry can be traced back to its source code using its tag.
What’s next
Now that we have scanned and tested model images built and pushed to a container registry, stay tuned for our next blog post which will discuss the process of deploying models into production.