6,861 reads

Multiclass Classification with Keras

by Aleksei RomanovSeptember 19th, 2022

Too Long; Didn't Read

This article is the first in a series of articles where we will cover common computer vision problems. In this article we will be focusing on a multiclass classification problem with practical code examples written with Keras. We will use cross validation for evaluating the pipeline and hyperopt for hyperparameters tuning. The business problem is to classify images to its anomaly type.

Companies Mentioned

featured image - Multiclass Classification with Keras

This article is the first in a series of articles where we will cover common computer vision problems. We will start with the simplest problem and the most popular one - classification.The classification problem is one of the base problems in machine learning along with the regression problem. The key difference between classification and regression is the target value. The regression model aims to predict the continuous value while the classification model predicts a binary vector. There are three types of classification problems:

Binary classification - the target class can be one out of two values.
Multiclass classification - the input value can be classified into one of the many classes, which is higher than two.
Multilabel classification - the input value can be classified into one or more than one class out of many classes.

In this article, we will be focusing on a multiclass classification problem with practical code examples written with Keras. We will use cross-validation for evaluating the pipeline and hyperopt for hyperparameters tuning.

Before we dive into code examples let’s look into the dataset and the problem itself.

The business problem is to classify images to their anomaly type. In the provided , we have four possible classes: bad exposition photos, blurry photos, photos with no people, and good quality images with people in those photos. Here is an example of a photo of each class:

As a prerequisite we have to load all the dependencies:

import os
import tensorflow as tf
import shutil
from sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model, Sequential
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.metrics import CategoricalAccuracy
from tqdm.notebook import tqdm
from sklearn.model_selection import KFold
import cv2
import matplotlib.pyplot as plt
import glob
import numpy as np
import pandas as pd
from hyperopt import tpe, hp, fmin, STATUS_OK, Trials
from hyperopt.pyll.base import scope
from hyperopt import space_eval

Keras supports training using datasets in the format of dataframe, which is the best choice for cross-validation purposes. Let’s load the dataset and create a dataframe:

IMG_SIZE = (224, 224)
K_FOLDS = 5
PATH_TO_DATA = '/content/dataset/'
all_classes = os.listdir(PATH_TO_DATA)

dataset = []
for class_name in all_classes:
   class_images = glob.glob(f"{PATH_TO_DATA}/{class_name}/*")
   class_images = [[img_path, class_name] for img_path in class_images]
   dataset.extend(class_images)
dataset_df = pd.DataFrame(dataset, index=range(len(dataset)), columns=['img_path', 'class'])

train_df, test_df = train_test_split(
   dataset_df,
   test_size=0.2,
   random_state=42
)

The data frame will have two columns with a path to the image and class name:

Model tuning and cross-validation

Bias and variance tradeoff is a common problem we have to deal with when we work with machine learning. The model has a high bias when it’s undertrained and it has a capacity to be improved. To improve the model performance we have to continue training and hyperparameters tuning until the accuracy is improving. There are three types of hyperparameters optimization methods:

Random search — randomly samples model hyperparameters and selects the best sample.
Grid search — samples all combinations of hyperparameters predefined in the table.
Bayesian optimization — optimizes sampling strategy by fitting the parameters of the distributions from the search space.

At some point, the accuracy of validation data will start degrading, which means that the model is overfitted and it has high variance. To avoid the model overfitting we have to stop training the model before the accuracy of validation data starts degrading. To validate the model properly we have to use k-fold cross-validation. Cross-validation allows you to find the best hyperparameters configuration on the multiple validation datasets, so the model will not be overfitted to only one validation set. Validation on multiple datasets helps you to show how the model will perform with unseen data.

Bayesian optimization is an optimal way to tune the model hyperparameters because it will not run a training process for redundant hyperparameters from the search space as grid search and random search does. Instead, it will find the optimal way to sample only optimal hyperparameters. Let’s see how it can be implemented using hyperopt. First, let’s define hyperparameters search space:

space = {
   "batch_size": hp.choice('batch_size', list(range(8, 128))),
   "act_fn": hp.choice('act_fn', [
      'sigmoid', 
      'tanh',
      'relu', 
      'relu6'
   ]),
   "model": hp.choice('model', [
      'efficientnet',
      'inception_v3', 
      'resnet50'
   ]),
   "opt": hp.choice('opt', [
       'adadelta',
       'adagrad',
       'adam',
       'ftlr',
       'nadam',
       'rmsprop',
       'sgd'
   ]),
   'num_epochs': hp.randint('num_epochs', 5),
   'lr': hp.uniform('lr', 1e-5, 1e-2),
   "latent_dim": hp.choice('latent_dim', list(range(20, 100)))
}

Each hyperparameter will be sampled from a predefined distribution, the parameters of this distribution will be estimated during the optimization process.

The next step will be to define the method where the model will be created and compiled:

def build_model(config, backbone_model):
   x = backbone_model.output
   x = GlobalAveragePooling2D()(x)
   x = Dense(config['latent_dim'], activation=config['act_fn'])(x)
   predictions = Dense(len(all_classes), activation='softmax')(x)
   model = Model(inputs=backbone_model.input, outputs=predictions)
   if config['opt'] == 'adadelta':
     opt = tf.keras.optimizers.Adadelta(learning_rate=config['lr'])
   if config['opt'] == 'adagrad':
     opt = tf.keras.optimizers.Adagrad(learning_rate=config['lr'])
   if config['opt'] == 'adam':
     opt = tf.keras.optimizers.Adam(learning_rate=config['lr'])
   if config['opt'] == 'ftlr':
     opt = tf.keras.optimizers.Ftrl(learning_rate=config['lr'])
   if config['opt'] == 'nadam':
     opt = tf.keras.optimizers.Nadam(learning_rate=config['lr'])
   if config['opt'] == 'rmsprop':
     opt = tf.keras.optimizers.RMSprop(learning_rate=config['lr'])
   if config['opt'] == 'sgd':
     opt = tf.keras.optimizers.SGD(learning_rate=config['lr'])
   model.compile(
       optimizer=opt,
       loss='categorical_crossentropy',
       metrics=[
         tf.keras.metrics.CategoricalAccuracy(),
         tf.keras.metrics.Precision(),
         tf.keras.metrics.Recall(),
         tf.keras.metrics.AUC()
       ]
   )

   return model

We adopted the model creation builder for dynamic architecture. For each optimization step, new architecture will be trained and the best architecture will be chosen.

As we discussed above we will use a cross-validation score as a cost function for the hyperparameters optimization process. In Cross-validation, we use Keras ImageDataGenerator.flow_from_dataframe to sample batches of the image paths from the dataframe. Cross-validation returns k scores, where k is a number of folds. For multiclass classification, categorical accuracy is a good choice to evaluate the model accuracy for each class.

def cross_validation(
   model, 
   generator, 
   batch_size, 
   num_folds, 
   num_epochs
):
   cv_scores = []
   kf = KFold(n_splits=num_folds, random_state=None, shuffle=True)
   X = np.array(train_df["img_path"])
   i=0
   for train_index, test_index in kf.split(X):
       train_data = X[train_index]
       test_data = X[test_index]
       NUM_STEPS = int(len(train_data) / batch_size) + 1
       train_data =     train_df.loc[train_df["img_path"].isin(list(train_data))]
       valid_data = train_df.loc[train_df["img_path"].isin(list(test_data))]
       train_gen = generator.flow_from_dataframe(
           dataframe=train_data,
           directory=PATH_TO_DATA,
           x_col = 'img_path',
           y_col = 'class',
           class_mode = 'categorical',
           classes = all_classes,
           target_size = IMG_SIZE,
           color_mode = 'rgb',
           batch_size = batch_size
       )
       valid_gen = generator.flow_from_dataframe(
           dataframe=valid_data,
           directory=PATH_TO_DATA,
           x_col = 'img_path',
           y_col = 'class',
           class_mode = 'categorical',
           classes = all_classes,
           target_size = IMG_SIZE,
           color_mode = 'rgb',
           batch_size = batch_size
       )
       hist = model.fit(
           train_gen,
           steps_per_epoch= NUM_STEPS,
           epochs = num_epochs,
           validation_data=valid_gen
       )
       print(hist.history)
       if i == 0:
          score_name = 'val_categorical_accuracy'
       else:
          score_name = f'val_categorical_accuracy_{i}'
       cv_scores.append(hist.history[score_name][-1])
       i+=1
return cv_scores

The final component required for hyperparameter optimization is an objective function. The objective function gets a sample config with hyperparameters and initiates the model and runs cross-validation. The objective function returns the average categorical accuracy score for each fold along with loss value and optimization status.

We built all the components for bayesian optimization now it’s time to run it:

trials = Trials()
best = fmin(
   fn=objective,
   space = space,
   algo=tpe.suggest,
   max_evals=100,
   trials=trials
)
best_config = space_eval(space, best)

The result of this optimization process is optimal hyperparameters:

Train optimal model

During hyperparameters tuning, we have found the optimal model configuration. Now let’s build the model using optimal hyperparameters and train it.

if best_config['model'] == 'resnet50':
   from tensorflow.keras.applications.resnet50 import ResNet50 as BackboneModel
   from tensorflow.keras.applications.resnet50 import preprocess_input as preprocessing_function
if best_config['model'] == 'efficientnet':
   from tensorflow.keras.applications.efficientnet import EfficientNetB2 as BackboneModel
   from tensorflow.keras.applications.efficientnet import preprocess_input as preprocessing_function
if best_config['model'] == 'inception_v3':
   from tensorflow.keras.applications.inception_v3 import InceptionV3 as BackboneModel
   from tensorflow.keras.applications.inception_v3 import preprocess_input as preprocessing_function
generator = ImageDataGenerator(
   preprocessing_function=preprocessing_function
)
backbone = BackboneModel(weights='imagenet', include_top=False)
model = build_model(best_config, backbone)

Since we already used cross-validation for tuning the model, we don’t need to split the dataset into different folds. Instead, we will only use one train test split for training and evaluation of the model.

train_gen = generator.flow_from_dataframe(
   dataframe=train_df,
   directory=PATH_TO_DATA,
   x_col = 'img_path',
   y_col = 'class',
   class_mode = 'categorical',
   classes = all_classes,
   target_size = IMG_SIZE,
   color_mode = 'rgb',
   batch_size = best_config['batch_size']
)
test_gen = generator.flow_from_dataframe(
   dataframe=test_df,
   directory=PATH_TO_DATA,
   x_col = 'img_path',
   y_col = 'class',
   class_mode = 'categorical',
   classes = train_gen.class_indices.keys(),
   target_size = IMG_SIZE,
   color_mode = 'rgb',
   batch_size = best_config['batch_size']
)

As a final step, we have to invoke model.fit to train the model:

NUM_STEPS = int(len(train_df) / best_config['batch_size']) + 1
hist = model.fit(
   train_gen,
   steps_per_epoch= NUM_STEPS,
   epochs = best_config['num_epochs'],
   validation_data=test_gen
)

This is it! You have successfully trained the model, now it’s a turn to evaluate and test the trained model.

Evaluate test data

But this is not the end. How to make sure that the model we trained performs well on unseen data? How do we get insights into the model performance? Let’s discuss model evaluation steps in more detail.

Let’s import the dependencies:

from sklearn.metrics import classification_report, confusion_matrix import seaborn as sns
from PIL import Image 
from tqdm.notebook import tqdm tqdm.pandas()

Next, we have to implement helper functions:

def draw_confusion_matrix(true, preds):
   conf_matx = confusion_matrix(true, preds)
   sns.heatmap(
      conf_matx, 
      annot=True, 
      annot_kws={"size": 12},
      fmt='g', 
      cbar=False, 
      cmap="viridis"
   )
   plt.show()

def convert_to_labels(preds_array):
    preds_df = pd.DataFrame(preds_array)
    predicted_labels = preds_df.idxmax(axis=1)
return predicted_labels

def preprocess_image(img):
   img = img.resize((224, 224))
   img = np.array(img)
   img = np.expand_dims(img, axis=0)
   img = preprocessing_function(img)
return img

def make_prediction(model, img_path):
   image = Image.open(img_path)
   img_preprocessed = preprocess_image(image)
   prediction = np.argmax(model.predict(img_preprocessed), axis=-1)[0]
return prediction

To evaluate the model in Keras we can use the build-in method for evaluation:

results = model.evaluate(test_gen,    batch_size=best_config['batch_size'])
print('Test loss:', results[0])
print('Test categorical_accuracy:', results[1])
print('Test precision:', results[2])
print('Test recall:', results[3])
print('Test auc:', results[4])

Or make predictions on the test set for external metrics calculation:

test_df['class_id'] = test_df['class'].progress_apply(
   lambda x: train_gen.class_indices[x]
)
test_df['prediction'] = test_df['img_path'].progress_apply(
   lambda x: make_prediction(model, x)
)
test_preds_labels = test_df['prediction']
test_images_y = test_df['class_id']
print(classification_report(
   test_images_y,
   test_preds_labels,
   target_names=train_gen.class_indices.keys())
)

draw_confusion_matrix(test_images_y, test_preds_labels)

A Classification report and confusion matrix will help us to analyze model weaknesses and understand where they can be improved.

Test predictions

The finishing touch in our work will be to get the model prediction for the specific photo, which we will select.

To get human readable predictions let’s create a mapping between class indexes and their names:

mapping = dict(zip(
   train_gen.class_indices.values(),
   train_gen.class_indices.keys()
))

and here we come to the final step, making predictions:

PATH_TO_PHOTO =
'/content/dataset/good_quality_photos/000000233771.jpg'
image = Image.open(PATH_TO_PHOTO)
img_preprocessed = preprocess_image(image)
prediction = np.argmax(model.predict(img_preprocessed), axis=-1)[0]

Conclusion

Congrats! You have successfully trained a computer vision model for a multiclass classification problem. In addition to that, you used pest practices for model tuning and validation. Now you can create a computer vision pipeline using Keras. All code examples are available in the .

In the next article, we will work on another computer vision problem, follow the blog so you don’t miss the next post.

L O A D I N G
. . . comments & more!