visit
Who is this article for?
This article is for those who have created a machine learning model in a local machine and want to deploy and test the model within a short time.It's also for those who are looking for an alternative platform to deploy their machine learning models.Let's get started! 🚀“Only when a model is fully integrated with the business systems, we can extract real value from its predictions”. — Christopher SamiullahThere are different platforms that can help you deploy your machine learning model. But for most of these platforms, it takes a lot of time and resources to configure the environment and deploy your model.For example, Sagemaker offers a popular library and ML frameworks, but you still have to depend on them for new releases. This might mean that you won't be able to deploy your model on time. Let's say the Sagemaker platform has scikit-learn v0.24 in their environment and you want to train and deploy your model with scikit-learn v1.0.1. You will not be able to do it until Sagemaker upgrades to the new version of scikit learn (1.0.1).In this article, you will learn how to use Aibro to deploy your model quickly and easily.
Import the Important packages
We need to import Python packages to load the data, clean the data, create a machine learning model, and save the model for deployment.# import important modules
import numpy as np
import pandas as pd
# sklearn modules
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB # classifier
from sklearn.metrics import (
accuracy_score,
classification_report,
plot_confusion_matrix,
)
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
# text preprocessing modules
from string import punctuation
# text preprocessing modules
from nltk.tokenize import word_tokenize
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re #regular expression
# Download dependency
for dependency in (
"brown",
"names",
"wordnet",
"averaged_perceptron_tagger",
"universal_tagset",
):
nltk.download(dependency)
import warnings
warnings.filterwarnings("ignore")
# seeding
np.random.seed(123)
# load data
data = pd.read_csv("../data/labeledTrainData.tsv", sep='\t')
# show top five rows of data
data.head()
# check the shape of the data
data.shape
# check missing values in data
data.isnull().sum()
id 0
sentiment 0
review 0
dtype: int64
How to Evaluate Class Distribution
We can use the value_counts() method from the Pandas package to evaluate the class distribution from our dataset.# evalute news sentiment distribution
data.sentiment.value_counts()
1 12500
0 12500
Name: sentiment, dtype: int64
How to Process the Data
After analyzing the dataset, the next step is to preprocess the dataset into the right format before creating our machine learning model.The reviews in this dataset contain a lot of unnecessary words and characters that we don’t need when creating a machine learning model.We will clean the messages by removing stopwords, numbers, and punctuation. Then we will convert each word into its base form by using the lemmatization process in the NLTK package.The text_cleaning() function will handle all necessary steps to clean our dataset.stop_words = stopwords.words('english')
def text_cleaning(text, remove_stop_words=True, lemmatize_words=True):
# Clean the text, with the option to remove stop_words and to lemmatize words
# Clean the text
text = re.sub(r"[^A-Za-z0-9]", " ", text)
text = re.sub(r"\'s", " ", text)
text = re.sub(r'http\S+',' link ', text)
text = re.sub(r'\b\d+(?:\.\d+)?\s+', '', text) # remove numbers
# Remove punctuation from text
text = ''.join([c for c in text if c not in punctuation])
# Optionally, remove stop words
if remove_stop_words:
text = text.split()
text = [w for w in text if not w in stop_words]
text = " ".join(text)
# Optionally, shorten words to their stems
if lemmatize_words:
text = text.split()
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in text]
text = " ".join(lemmatized_words)
# Return a list of words
return(text)
#clean the review
data["cleaned_review"] = data["review"].apply(text_cleaning)
#split features and target from data
X = data["cleaned_review"]
y = data.sentiment.values
# split data into train and validate
X_train, X_valid, y_train, y_valid = train_test_split(
X,
y,
test_size=0.15,
random_state=42,
shuffle=True,
stratify=y,
)
# Create a classifier in pipeline
sentiment_classifier = Pipeline(steps=[
('pre_processing',TfidfVectorizer(lowercase=False)),
('naive_bayes',MultinomialNB())
])
# train the sentiment classifier
sentiment_classifier.fit(X_train,y_train)
# test model performance on valid data
y_preds = sentiment_classifier.predict(X_valid)
accuracy_score(y_valid,y_preds)
#save model
import joblib
joblib.dump(sentiment_classifier, '../models/sentiment_model_pipeline.pkl')
Step 1: Install the aibro Python library
To install aibro, run the following command in your terminal:pip install aibro
Step 2: Prepare the Model Repository
The model repository will be formatted in the following structure.(a) model folderThis folder will contain the model you have created.
(b) data folderThe data folder will have a JSON file that has an input value. For our case, the input will have a text value (review) as follows.
{
"data": "I loved it, the kids loved it. It shows them that anything is possible but more especially when you have that one person fighting for you. That one person who believes in you without fail. I appreciated the various life lessons included in the film about being humble and thankful but commanding respect at the same time despite where or what background you come from. Success doesn’t see age, race or gender but sadly opportunity often does. Will Smith doesn’t let the lack of opportunity beat them as a family and the family is a team. The bigger picture is always knowing that there is a team involved in most successful people."
}
Note: Remember there is no restriction on how you want to format your input and output.
(c) predict.pyThe python file should contain two python functions.
load_model():
This function is responsible for loading the machine learning model from the model folder and returning it. In this tutorial, we will use the joblib package to load the model we have created.
def load_model():
#load model
model = joblib.load("model/sentiment_model_pipeline.pkl")
return model
run():
This function will receive a model as the input and then load the data from the data folder. Finally, it will make predictions and return the result.
def run(model):
fp = open("data/data.json", "r")
data = json.load(fp)
review = text_cleaning(data["data"])
result = {"data": model.predict([review])}
return result
# import important modules
import json # load data
import joblib # load model
from clean import text_cleaning # function to clean the text
def load_model():
#load model
model = joblib.load("model/sentiment_model_pipeline.pkl")
return model
def run(model):
fp = open("data/data.json", "r")
data = json.load(fp)
review = text_cleaning(data["data"])
result = {"data": model.predict([review])}
return result
if __name__ == "__main__":
run(load_model())
(d) requirements.txt
Aibro will first need to install the packages required to run your model before deploying the model itself. You can either manually write the packages and their version number in the requirements.txt or run the following command which will do the same:
pip list --format=freeze > requirements.txt
nltk==3.6.7
numpy==1.19.1
pandas==1.0.5
scikit_learn==0.23.1
joblib==1.0.0
Note: It is also recommended to use the pipreqs Python package to generate requirements.txt. This is because it will include Python packages based on imports in your project instead of all packages in your environment.
$ pipreqs /home/aibro_project
Successfully saved requirements file /home/aibro_project/requirements.txt
(e) Other Artifacts
You can also include other files or folders that will be used by the predict.py Python file. For example, in the model we have created, we will need to clean the input before making a prediction.
# import packages
import nltk
# Download dependency
corpora_list = ["stopwords","names","brown","wordnet"]
for dependency in corpora_list:
try:
nltk.data.find('corpora/{}'.format(dependency))
except LookupError:
nltk.download(dependency)
taggers_list = ["averaged_perceptron_tagger","universal_tagset"]
for dependency in taggers_list:
try:
nltk.data.find('taggers/{}'.format(dependency))
except LookupError:
nltk.download(dependency)
tokenizers_list = ["punkt"]
for dependency in tokenizers_list:
try:
nltk.data.find('tokenizers/{}'.format(dependency))
except LookupError:
nltk.download(dependency)
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import re #regular expression
from string import punctuation
stop_words = stopwords.words('english')
# function to clean the text
def text_cleaning(text, remove_stop_words=True, lemmatize_words=True):
# Clean the text, with the option to remove stop_words and to lemmatize word
# Clean the text
text = re.sub(r"[^A-Za-z0-9]", " ", text)
text = re.sub(r"\'s", " ", text)
text = re.sub(r'http\S+',' link ', text)
text = re.sub(r'\b\d+(?:\.\d+)?\s+', '', text) # remove numbers
# Remove punctuation from text
text = ''.join([c for c in text if c not in punctuation])
# Optionally, remove stop words
if remove_stop_words:
text = text.split()
text = [w for w in text if not w in stop_words]
text = " ".join(text)
# Optionally, shorten words to their stems
if lemmatize_words:
text = text.split()
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in text]
text = " ".join(lemmatized_words)
# Return a list of words
return(text)
from aibro import Inference
api_url = Inference.deploy(
artifacts_path="./sentiment_model_repo",
dryrun=True,
)
Note: The formatted model repository is saved at path “./sentiment_model_repo”.
(a) model_name
The model name should be unique with respect to all current active inference jobs under . In this example, the model name will be "my_sentiment_classifier".
(b) machine_id_config
This is the id of the machine that will run our model. For this example we will use "c5.large.od".You can see the entire list in the .
(c) artifacts_path
This will be the path to your formatted machine learning model repository. For this example, the path is "./sentiment_model_repo".
(d) description
You can also add a description of your model deployment.
from aibro import Inference
api_url = Inference.deploy(
model_name = "my_movie_sentiment_classifier",
machine_id_config = "c5.large.od",
artifacts_path = "./sentiment_model_repo",
description="my first inference job",
)
Note: if your inference job is public, {client_id} is filled out with "public". Otherwise, {client_id} should be filled out with one of your ' IDs.
In this tutorial, the API URL will be //api.aipaca.ai/v1/DavisDavid/public/my_sentiment_classifier/predictNote: The posted data will replace everything in the data folder. Therefore, your posted data should have the same format as whatever you had in the data folder initially.
import requests
import json
review = {"data": "A truly beautiful film that will having you crying with joy and pride. The (few) poor reviews cite a lack of authenticity regarding Richards character and a lack of screen time for the other major family members, including Serena. While I admittedly don’t know exactly the kind of person and father Richard was"}
prediction = requests.post(
"//api.aipaca.ai/v1/DavisDavid/public/my_movie_sentiment_classifier/predict",
data=review,
)
result = prediction.text
print(result)
As you can see we managed to predict by using the API, and the model predicts that the review is positive (1).
rom aibro.inference import Inference
id = "inf_cd712f4a-4b59-4e44-8787-9c5b5450ff6d"
Inference.complete(job_id=id)