KNeighborsClassifier can be used to solve classification problems as well as regression problems. It is not known to scale very well for big data but there are workarounds for performance issues and for its shockingly precise accuracy and intuitive inner-workings, kNN is still a commonly used machine learning algorithm.
In this tutorial we will look at KNeighborsClassifier and how it can be used via Python and Scikit-learn library.
You can use K-Nearest Neighbors implementation for classification problems using KNeighborsClassifier from sklearn.neighbors module.
You can create a KNeighborsClassifier using Python and Scikit-Learn by using the code below:
from sklearn.neighbors import KNeighborsClassifier
KNC = KNeighborsClassifier()
Once the model is initiated it will be ready for training using training data.
You can use a code similar to the one below in Python to train a kNN classifier.
KNC.fit(X_train, y_train)
After the model is trained it will be ready to be used for classification predictions. You can see the code example below:
yhat = KNC.predict(X_test)
KNeighborsClassifier can be optimized by adjusting its hyperparameters. Tuning offers plenty of options to make kNN algorithm implementations more accurate, more efficient and more performant. See the related article if you are interested in optimizing kNN models:
Additionally you can use a radius based kNN classifier instead of k-Nearest. In this case classification will be based on a fixed radius parameter.
RadiusNeighborsClassifier can be used with a very similar process as described for kNeighborsClassifier above except you will need to create a RadiusNeighborsClassifier initially instead of kNeighborsClassifier.
You can use a Python code similar to below for that:
from sklearn.neighbors import RadiusNeighborsClassifier
RNC = RadiusNeighborsClassifier()
The main difference between kNeighborsClassifier and RadiusNeighborsClassifier is what they make their classification based on. While former’s classification is based on k amount of closest neighbors, latter’s classification is based on a fixed radius.
This difference has some consequences. k-Nearest Neighbors Algorithm will classify every sample point since it evaluates nearest neighbors for all sample points. Radius Neighbors on the other hand will leave out sample that can not be reached by the radius value for classification.
RadiusNeighborsClassifier can be used with a very similar process as described for kNeighborsClassifier above except you will need to create a RadiusNeighborsClassifier initially instead of kNeighborsClassifier.
As a result both classifiers have a couple of parameters that are unique to them.
In this kNN tutorial we have seen how to create, train and predict with a kNN classifier model using Scikit-Learn and Python.
Additionally we have seen an alternative implementation called RadiusNeighborsClassifier which has certain benefits in comparison to KNeighborsClassifer when outlier management is critical.
You can find a more practical kNN example below along with other similar algorithms that are definitely worth exploring.