visit
The task was to predict whether a patient would develop Alzheimer's or not. However, the moment I ran the model on new (unseen) data, the accuracy went down to 57% and the excitement was gone 😢.
This was my first experience with overfitting. Overfitting happens when your algorithm can't generalize data patterns because it learns the data too well. When you present new data to an overfitted model, the predictive performance goes down, as in my case.A good model fit, one that can generalize data well, should be simple. This is the idea behind the law of parsimony, often called Occam's razor.
"Don't elaborate the nature of something beyond necessity."The frequent interpretation of this statement is that simple solutions are better than complex solutions.
In other words
complex == bad
, simple == good
. We can apply this thinking to the process of choosing the best model fit. Remember my initial 97% accuracy that went to 57% on the new data? A model that learns all the relationships in the training data becomes too complex. Then, when it comes to predicting outcomes for the new observations, it performs poorly because the model is inflexible and tailored only to the training data. On the other hand, a simple model can have a looser fit on training data, but it can also achieve a good fit on unseen data.Ensemble modeling, among other methods, can prevent overfitting.Ensemble models are models consisting of multiple models or algorithms. Individual models can be combined through various methods such as bagging, boosting, and stacking.
In general, we can differentiate between two types of ensembles. They are either based on 1) many weaker models that are stronger together (e.g. boosting) or 2) fewer well-thought-out models combined via stacking into a metamodel.
You can imagine the weaker models as ants. One ant can't build an anthill, but the whole colony of ants, well, that's a different story. The second group of ensemble methods is more like Avengers. (Yes, it wasn't just a hook in the title.)Every Avenger has a unique skill, and they are superheroes already, but they are stronger together. This is because their individual disadvantages are covered by their teammates.Simplicity doesn't always lead to greater accuracy.Let's go back to the analogy world. When ants are building their anthills, they need every ant they can get. It's similar for the weak learners, although you can still overfit this type of ensemble, so you need to be careful. When Avengers meet for the first time, some of them are overconfident and don't appreciate teamwork. But after they go through challenges together, they change and improve for the better. Ensemble modeling averages out the biases of the individual models or methods. This causes the predictions to be more stable with lower variance. In other words, the predictions based on ensemble modeling are simpler in terms of their statistical characteristics.
To consolidate how great ensemble modeling is, I have an example of their success in making an impact. In 2013, the Centre for Disease Control and Prevention in the U.S. started crowdsourcing forecasting models in a "Predict the Influenza Season Challenge" from research teams across the country.
These seasonal challenges became popular, and the idea developed into a where you can click through individual and combined predictions. The main takeaway was that from all of the individual models, the forecasts from ensemble models were the most accurate. I recommend you have a look.
Read more about ensemble modeling or how to avoid overfitting:
Other resources: