visit
Multi-output Machine Learning — MixedRandomForest
0/1 based Binary output values can refer to a multi-label classification problem. Such examples also range from :
Multi-class Classification: Multi-class classification can be categorized as a traditional single-output learning paradigm when the output class is represented by the integer encoding. It can also be extended to a multi-output learning scenario if each output class is represented by the one-hot vector.
Fine-grained Classification: In this type of classification, though the vector representation is the same as fine-grained classification outputs to the multi-class classification outputs, their internal structures of the vectors are different. Labels under the same parent tend to have a closer relationship than the ones under different parents in the label hierarchy.
Multi-task Learning: Multi-task learning aims at learning multiple related tasks simultaneously, where each task outputs one single label, and learning multiple tasks is similar to learning multiple outputs. It leverages the relatedness between tasks to improve the performance of learning models. The major difference between multi-task learning and multi-output learning is that different tasks might be trained on different training sets or features in multi-task learning, while output variables usually share the same training data or features in multi-output learning.
In Multi-output pattern recognition problems, each instance in the dataset has two or more output values (nominal or real-valued)— i.e., the output value is a vector rather than a scalar. They are solved by any of the following methods:
The first approach of training an inductive classifier or regression model can be a time-consuming task — particularly so when training data sets are very large. When multiple models need to be trained using the same input data — but with different output data — the training time is unusually too high making it unsuitable for large datasets. Consequently, this also impacts the processing requirements.
The second adaption approach enables to create a model that simultaneously predicts a set of two or more classification labels, regression values, or even joint classification-regression outputs from only a single training iteration. If the prediction tasks are related (i.e., there is a correlation or covariance between output values), training a coherent multi-output model can potentially bring benefits in the form of increased predictive performance compared to training multiple disjoint models.
Here in this blog, we discuss a Mixed/Multi-target RandomForest model, that supports :
"multi-output problems with multiple classification outputs, multiple regression outputs, as well as arbitrary joint classification-regression outputs”.
Further, the algorithm provides support for mixed-task multi-task learning, i.e., it is possible to train the model on any number of classification tasks and regression tasks, simultaneously. The Random Forest predictor lets each individual ensemble member vote for the most probable output according to its learned decision rule. The ensemble members’ votes are tallied and aggregated, as a combined classifier — with mode for classification and mean for regression — to yield a common ensemble output.
In the multi-output problem containing both classification tasks and regression tasks, solving unrelated joint classification-regression problems need not be more difficult than training a set of classifiers and regressors on the individual tasks. If the tasks are related, the algorithm adaption method can be used to provide the best results in terms of predictive performance.
Joint Classification-Regression Trees can be solved with a tree induction algorithm that simultaneously solves one classification task and one regression task. Much like MT-DT and MRT, the joint classification-regression tree (JCRT) solves multiple simultaneous prediction tasks by modifying the node-split function in the inductive step and marking terminal nodes with appropriate values for each task. Due to the nature of joint classification-regression problems, the modified split function is required to consider the error of both the classification part and the regression part simultaneously.
The split function uses an entropy function consisting of three parts:Shannon entropy is computed for the classification part. A weighted differential entropy is calculated for the regression part Shannon entropies and differential entropies exist in different ranges, hence a normalization step is applied to combine the two entropies.Joint classification-regression forests, when evaluated on spatially structured data in the form of CT scans perform two tasks are:from morfist import MixedRandomForest, cross_validation
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import cross_val_score
import sklearn.datasets as dst
import numpy as np# Config
n_trees = 20x_reg, y_reg = dst.load_linnerud(return_X_y=True)
x_cls, y_cls = dst.load_digits(return_X_y=True)
x_mix_1, y_mix_1 = x_reg, np.vstack([y_reg, y_reg < y_reg.mean()]).T
x_mix_2, y_mix_2 = x_cls, np.vstack([y_cls, y_cls]).T
n_estimators=10,
max_features='sqrt',
min_samples_leaf=5,
choose_split='mean',
class_targets=None
cls_rf = MixedRandomForest(
n_estimators=n_trees,
min_samples_leaf=1,
class_targets=[0]
)
cls_skrf = RandomForestClassifier(n_estimators=n_trees)
cls_scores = cross_validation(
cls_rf,
x_cls,
y_cls,
class_targets=[0],
folds=10
)
scores = cross_val_score(
cls_skrf,
x_cls,
y_cls
)
print('Classification with Single output: ')
print('\t morfist (accuracy): {}'.format(cls_scores.mean()))
print('\t scikit-learn (accuracy): {}'.format(scores.mean()))
Results
morfist (accuracy): 0.9632721202003339
scikit-learn (accuracy): 0.9281
reg_rf = MixedRandomForest(
n_estimators=n_trees,
min_samples_leaf=5
)
reg_skrf = RandomForestRegressor(n_estimators=n_trees)
reg_scores = cross_validation(
reg_rf,
x_reg,
y_reg,
folds=10
)
scores = cross_val_score(
reg_skrf,
x_reg,
y_reg,
scoring='neg_mean_squared_error'
)
print('Multivariate Regression multiple outputs: ')
print('\t morfist (rmse): {}'.format(reg_scores.mean()))
print('\t scikit-learn (rmse): {}'.format(np.sqrt(-scores.mean())))
Results
morfist (rmse): 11.758534341303097
scikit-learn (rmse): 17.79445305492162
mix_rf = MixedRandomForest(
n_estimators=n_trees,
min_samples_leaf=1,
class_targets=[0]
)
mix_scores = cross_validation(
mix_rf,
x_mix_2,
y_mix_2,
folds=10,
class_targets=[0]
)
print('Mixed output on Classification Dataset: ')
print('\t Task 1 (original) (accuracy): {}'.format(mix_scores[0]))
print('\t Task 2 (additional) (rmse): {}'.format(mix_scores[1]))
Results
Mixed output on Classification Dataset:
task 1 (original) (accuracy): 0.96272
task 2 (additional) (rmse): 1.0808
Other Python Libraries
: A python-based multi-output stream/batch learning framework, can be used within Jupyter Notebooks and can be used with scikit-learn. The following figure depicts the multi-output capabilities of different java and python libraries.Independent Vector: Independent vector is the vector with independent dimensions, where each dimension represents a particular label that does not necessarily depend on other labels. This includes tags, attributes, bag-of-words, bad-of-visual-words, hash codes and etc. of a given data.
Distribution: Provides the information of probability distribution for each dimension, like a tag with the largest weight.
Ranking: It shows the tags ordered from the most important to the least. Examples of its application are text categorization ranking, question answering, and visual object recognition.
Text: Text can be in the form of keywords, sentences, paragraphs or even documents. Applications for text outputs can be document summarization and paragraph generation.
Sequence: Sequence (used in speech recognition, language translation) is usually a sequence of elements selected from a label set or word set. Each element prediction is dependent on past predicted outputs and present input. An output sequence often corresponds with an input sequence.
Tree: The Tree is represented as a hierarchical labeled structure to display the outputs. The outputs have the hierarchical internal structure where each output belong to a label as well as its ancestors in the tree, useful in syntactic parsing.
Image: One of the output objects are images consisting of multiple pixel values. A single pixel is predicted depending on the input and the pixels around it to consider an overall region prediction. Image output applications include super-resolution construction, text-to-image synthesis, which generates images from natural language descriptions, and face generation.
Bounding Box: The bounding box is often used to find the exact locations of the objects that appeared in an image and it is commonly used in object recognition and object detection
Link: A partitioned social network with edges represents the friendship of the users, the goal is to predict whether two currently unlinked users will be friends in the future.
Graph: A graph made up of a set of nodes and edges and it is used to model the relations between objects, where each object is represented by a node. The connected objects are linked by an edge.
Others: Contour and polygons are similar to the bounding box which can be used to localize objects in an image. In information retrieval, the output can be a list of data objects that are similar to the given query. In image segmentation, the output is usually segmentation masks for different objects, used for detecting common saliency on the multiple images.
It is seen that MixedRandomForest gives better accuracy for Classification and better RMSE for Regression when applied on Scikit-learn’s dataset — linnerud dataset (multivariate regression), and digits dataset (classification).
But MixedRandomForest takes more time to run, than scikit-learn. There’s a slight increase in accuracy on the multi-output classification dataset. You can read more on the deep learning mechanisms of Keras’s multi-output
classification at and
multi-label classification at also provides a method to evaluate the multi-output regression model.