paint-brush
How to Easily Deploy ML Models to Production by@msitnikov
354 reads
354 reads

How to Easily Deploy ML Models to Production

by Mikhail SitnikovNovember 20th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Machine Learning(ML) world is that it takes a lot longer to deploy ML models to production than to develop it. Modern software requires a variety of crucial properties such as on-demand scaling and high availability. It might take a lot of effort and time to correctly deploy models into productions. Let’s discuss some different options you have when it comes to deploying ML models. Models can be easily wrapped into specially designed servers such as NVIDIA Triton or Tensorflow. The most direct way to deploy anything is to rent a VM, wrap a model into some kind of server and leave it running.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - How to Easily Deploy ML Models to Production
Mikhail Sitnikov HackerNoon profile picture

One of the known truths of the Machine Learning(ML) world is that it takes a lot longer to deploy ML models to production than to develop it.¹

The problem of deploying ML models to production is well known. Modern software requires a variety of crucial properties such as on-demand scaling and high availability. As a result, it might take a lot of effort and time to correctly deploy models into productions.

Let’s discuss some different options you have when it comes to deploying ML models. Variants are provided in order from the most general to ML-specific.

1. Hardwave / VM

The most direct way to deploy anything is to rent a VM, wrap a model into some kind of a server and leave it running. While being extremely straightforward and customizable this method has numerous drawbacks such as hard integration into CI/CD pipelines and isolation problems.

2. Containers

It is possible to deploy ML models in Docker containers using Kubernetes or similar orchestration tools. This option provides way more quality of life improvements. Models can be easily wrapped into specially designed servers such as  or  (works for VM option as well). Now it is even easier to chain models together using highly sophisticated frameworks such as .

However, customizability comes at a cost of DevOps complexity and a requirement to maintain technologies that make your model run.

3. General purpose serverless platforms

An easy way to just drop your model on the cloud would be using serverless PaaS platforms. Here you have to wrap your model into some preprocessing and postprocessing code.

Platforms like  or  provide more flexibility since you can even wrap your code into a container while  functions,  or  make it much easier to deploy, even providing great integrations into respective cloud services.

This approach is great for background task processing since inference time is relatively high because you are limited to processing models on CPU and models themself are commonly stored far away from processing nodes and may require time to load.

4. ML-focused serverless providers

Now we are seeing a rise of ML-focused serverless providers that host your model providing an API or a set of frameworks. One set of providers would be  or . These services still require you to rent underlying compute instances on which your models will be running.

Another option would be to use , , . Here you generally get true serverless experience since you pay only for time your models are running.

In general, using ML-focused serverless providers allows you to separate GPU-intensive computations from CPU-intensive while providing on-demand scalability for the former. However, you still have to perform pre and post-processing on the client application or using cloud functions.

That's it

We successfully reviewed common options to deploy ML models.

Thank you for reading! Stay tuned for more articles and feel free to write in the comment section or ask questions on

One of the biggest underrated challenges in machine learning development is the deployment of the trained models in production that too in a scalable way. One joke on it I have read is “Most common way, Machine Learning gets deployed today is powerpoint slides :)”.

References

  • Adarsh Shah. (June 21, 2020). Challenges Deploying Machine Learning Models to Production 
  • Sambit Mahapatra. (March 17, 2019). Machine Learning Models as Micro Services in Docker 
  • James Le.(March 6, 2020). The 5 Components Towards Building Production-Ready Machine Learning Systems 

Also published at

바카라사이트 바카라사이트 온라인바카라