kNN Regression
How to Construct?
1- KNeighborsRegressor
We can also use k-Nearest Neighbors algorithm for regression problems. In Scikit-Learn there is a regressor implementation of kNN named KNeighborsRegressor and it can be imported from sklearn.neighbor module.
a) Creatina KNeighborsRegressor Model:
from sklearn.neighbors import KNeighborsRegressor
KNR = KNeighborsRegressor()
Once the model is created next steps will be to fit the model and it will be ready for prediction.
b) Training KNeighborsRegressor Model:
Once the model is created next steps will be to fit the model and in this phase model is being trained with training data.
KNR.fit(X_train, y_train)
c) Predicting with KNeighborsRegressor Model:
After training model will be ready for predictions.
yhat = KNR.predict(X_test)
This kNN implementation for regression has a few very useful hyperparameters that can be optimized. Most notably, n_neighbors and weights can be used to tune necessary neighbor amount for regression and weights can be used to control the significance of neighbor relations based on distance. You can find more details in the article below:
Radius based kNN
2- RadiusNeighborsRegressor
Similar to the classification models of kNN, you can also work with a radius based Neighbor algorithm for regression named RadiusNeighborsRegressor. This implementation will result in a version of kNN algorithm that is based on a fixed radius value instead of neighbors value (k).
Usage of the RadiusNeighborsRegressor is very similar to kNeighborsRegressor with the main difference being the radius parameter instead of n_neighbors parameter.
radius parameter can be used to define the main criteria for neighbor relations and for each sample point algorithm will look for neighbors inside the given radius.
The main point of radius based neighbor regressor is that it allows a level of control on defining neighbors which is useful when you don’t want certain samples to be included in regression calculations. For example, kNN can be very sensitive to outliers because it includes k amount of neighbors in the calculation no matter how distant the neighbors are. But with RadiusNeighborsRegressor you can simply have outliers excluded when outliers are outside the fixed radius you defined.
Here is a simple Python code to get started with RadiusNeighborsRegressor:
from sklearn.neighbors import RadiusNeighborsClassifier
RNR = RadiusNeighborsRegressor()
After that, you can continue with the model similar to most machine learning regressors.
kNeighborsRegressor vs RadiusNeighborsRegressor
The main difference between kNeighborsRegressor and RadiusNeighborsRegressor is the criteria that defines the neighbor relations which is then used to make regression calculations.
While kNeighborsRegressor will base its regression calculations on k amount of neighbors defined by n_neighbors parameter, RadiusNeighborsRegressor will bese it regression calculations on neighbors that fall inside a radius value defined by radius parameter.
As a consequence, RadiusNeighborsRegressor won’t be affected by outlier values if they fall outside the radius value.
kNeighborsRegressor
- n_neighbors : Instead of radius, kNeighborsRegressor regresses based on k amount of nearest neighbors,
- outlier effect: kNN is known to be very sensitive to outlier values because it includes every sample as long as they are in the k amount of neighbor bag.
RadiusNeighborsRegressor
- radius : This parameter is unique to RadiusNeighborsRegressor which helps control the neighbor relations based on a fixed value.
- outlier effect : Since RadiusNeighborsRegressor operates based on a fixed radius value it is robust to outlier effects. If outlier falls outside the fixed radius assigned it will simply be excluded from the calcualtions.
KNeighborsRegressor Summary
In this k-Nearest Neighbors tutorial, we have seen the steps to create, train and predict with a k-Nearest Neighbors Regressor for regression projects using Scikit-Learn and Python programming language.
We have also seen the RadiusNeighborsRegressor model which is capable of making regression based on a fixed radius value for samples rather than a varying k-nearest neighbors criteria.
We have covered some of the differences between k-Nearest Neighbors and Radius Neighbors algorithms from the regression perspective.