Random Forest Regression

RandomForestRegressor is a convenient built-in Scikit-learn model for MachineLearning. It can be used to predict continuous values and RandomForestRegressor is one of the most versatile, optimizable and accurate machine learning models out there.

In this Random Forest Tutorial, we will demonstrate basics of a RandomForestRegressor implementation using Python and machine learning library Scikit-Learn.

How to Construct?

RandomForestRegressor

You can import RandomForestRegressor from sklearn.ensemble module and then use it in your regression projects to predict continuous values using Random Forest machine learning algorithm.

a) Creating RandomForestRegressor Model:

from sklearn.ensemble import RandomForestRegressor
RF = RandomForestRegressor()

b) Training RandomForestRegressor Model:

Once the model is created next steps will be to fit the model and it will be ready for prediction.

RF.fit(X_train, y_train)

c) Predicting with RandomForestRegressor Model:

After training model will be ready for inference. You can call .predict() method with suitable data to start making predictions.

yhat = RF.predict(X_test)

Random Trees have quite a few parameters that can be tuned. In addition to decision tree parameters that can also be tweaked for trees included in a random forest, there are also random forest parameters that are specific to random forest. Some of these are as below:

bootstrap, default=True
oob_score, default=False
n_jobs, default=None
verbose, default=0
warm_start, default=False

What's the difference between RandomForestRegressor and DecisionTreeRegressor?

Random Forest Tree Size (n_estimators)

Random Forests are ensemble models based on decision trees. This means they aggregate the results of multiple decision trees resulting in a more robust and more accurate model.

Since Decision Trees are building blocks for Random Forests there are many common parameters and hyperparameters between them. One distinctive parameter in Random Forest algorithm and its Scikit-Learn implementation is n_estimators.

You can use n_estimators to define the amounts of trees to be used in the random forest algorithm. It is 100 by default meaning 100 trees will be used in the forest.

Trees created and trained in Random Forest algorithm are also called “Estimators” in random forest algorithm hence the parameter name n_estimators.

You can adjust it to different values and lower it if performance is a concern. Building a random forest with 10 trees will perform significantly better than building a forest with 100 trees.

RF = RandomForestRegressor(n_estimators=100)

You can read more about Random Forest parameters and how to tune them.