Who Invented Random Forest?
1- Random Forest Original Paper
Random Forest algorithms are very popular today and they can usually be used out of the box making it easy to enter machine learning field and still make staggeringly accurate predictions.
Leo Breiman is the co-founder of Random Forest algorithm which he proposed with Adele Cutler in the original paper linked below. Random Forest uses decision trees and ensembles them to result in a more robust, stable and accurate model.
Leo Breiman’s original paper Random Forests can be found on Berkeley University’s website here.
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Leo Breiman, Jan-2001
Random Forest takes advantage of a technique called bagging (bootstrap aggregating) which takes a subset of the samples (rows) and trains a tree with it then repeats it with other trees using different subsets.
In addition to bagging, Random Forest uses a random feature selection method first coined by Tin Kam Ho in 1998. You can access Tin’s impressive original paper here:
Tin Kam Ho is working with IBM Watson since 2014 according to her personal page on IBM’s website.
2- Roots from Decision Trees
Breiman is also the co-author of the original paper that introduced Decision Trees to modern day Machine Learning through CART learning algorithm in 1984.
Breiman made quite a few advancements in bagging technique which is a method used for ensembling throughout the second half of 1990s leading to the discovery of Random Forest algorithm in January 2001. Breiman’s bagging related papers from 1995 to 2000 are as following and can be found on his personal webpage on Berkeley University’s Statistics Department.
- Leo Breiman – 1996: Bagging predictors
- Leo Breiman – 1996: Out-of-bag estimation
- Leo Breiman – 1998: Arcing classifiers
- Leo Breiman – 1998b: Randomizing outputs to increase prediction accuracy
- Leo Breiman – 1999: Using adaptive bagging to debias regressions
- Leo Breiman – 2000: Some infinity theory for predictor ensembles
3- AdaBoost like favorable results
It’s also quite likely that Leo Breiman was inspired by Adaboost in one way or another and he also references it in his original Random Forest paper. Adaboost was found by Yoav Freund and Robert E. Schapire in 1996 as a boosting method, AdaBoost Original paper here. Boosting is another highly favorable ensembling technique that can result in very accurate predictions.
Breiman points out to satisfactory results with Random Forest algorithm relative to AdaBoost algorithm in the Abstract section of Random Forests as below:
Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Freund and Schapire), but are more robust with respect to noise.
Leo Breiman
4- Random Forest History Summary
In Summary we have covered Random Forest’s discovery and the original paper it was introduced in by Leo Breiman in 2001.
We have also seen some of the potential influences on Breiman’s discovery to Random Forest machine learning algorithm and its relations to Decision Tree algorithm which Breiman also co-authored.
It’s fascinating to see that we owe so much to Breiman regarding decision trees, random forests and bagging techniques in general as he pioneered many new technologies in this field.