DBSCAN Clustering
DBSCAN is a great clustering algorithm that can be used to cluster arbitrary or overlapping shapes. It has specific features that can’t be found in other clustering algorithms such as K-Means or Hierarchical Clustering.
In this Machine Learning Tutorial we will explore DBSCAN class in Scikit-Learn and see how it can be used to create a DBSCAN model for clustering implementations.
How to Construct?
DBSCAN
DBSCAN can be imported from Scikit-Learn’s sklearn.cluster module. Then using DBSCAN class we can construct a DBSCAN model. You can see the Python code below:
Creating DBSCAN Model:
from sklearn.cluster import DBSCAN
DBS = DBSCAN()
After creating the model we can use it for clustering tasks. Since DBSCAN is an unsupervised machine learning algorithm there is no training that takes place. We can directly feed the data to the DBSCAN model and clusters will be created.
Clustering with DBSCAN Model:
We can create DBSCAN clusters by using fit method on the model we’ve created earlier.
DBS.fit(X)
DBSCAN will produce some useful attributes that can be used to analyze and visualize the clusters that have been created. You can see these attributes in the next section.
What can we get from DBSCAN Model?
Attributes of DBSCAN
After successfully running the DBSCAN algorithm we end up with attributes below.
- labels_
- components_
- core_sample_indices_
These attributes show us the clusters that have been created. labels_ can be used to identify which cluster a component belongs to. components_ is the collection of members in a cluster. Its dimensions will be based on the number of columns there are in the data we fed to the model.
DBS = DBSCAN(eps=0.3, min_samples=10)
DBS.fit(X)
print(DBS.labels_)
print(DBS.components_)
print(DBS.core_sample_indices_)
You can perform a number of optimizations on DBSCAN algorithm. Thanks to Scikit-Learn library’s intuitive implementation machine learning models can be easily tuned by adjusting the hyperparameters that belong to the respective class. For a comprehensive tutorial on tuning DBSCAN machine learning models you can see the article below:
DBSCAN Summary
In this DBSCAN tutorial we have seen how DBSCAN can be implemented using Scikit-Learn and Python.
For a more practical implementation you can see this article: