DBSCAN Advantages
DBSCAN is a fantastic machine learning algorithm that compliments other clustering models really well such as K-Means and Hierarchical (Agglomorative or Divisive) Clustering algorithms. When we look at DBSCAN’s working principals we see that it’s an algorithm that’s different from both K-Means and Hierarchical Clustering.
While K-Means is useful for creating spherical clusters from the entire dataset and Hierarchical Clustering is useful for obtaining different multiple levels of hierarchical clusters (similar to a tree structure where you can decide the tree depth), DBSCAN can create clusters around arbitrary shapes (like a polynomial, irregular snake like shape or circular peripheral) or overlapping clusters such as a region with a different density inside another shape.
1- Arbitrary Shapes
Probably the biggest selling point of DBSCAN algorithm is its ability to cluster arbitrary shapes and overlapping regions based on its density-based clustering technique.
This allows clustering in pretty much any shape whether it’s a triangle, ellipse, cross or just an arbitrary shape.
2- Optimization
DBSCAN algorithm has intuitive parameters which can be tuned to control the clustering outcomes.
Epsilon can be used to define the distance to scan for members in a region with same density while min_samples is the parameter that defines the minimum amount of samples required for a member to be considered in the same density region.
3- Simple and Easy
Using as well as understanding DBSCAN is also easy.
Although the outcomes can be clustering of complex shaped regions. Pretty much all you need to do is to decide the epsilon (distance for scanning) and minimum sample amount of the cluster regions. If you don’t have a sophisticated guess this can also be an iterative process like most Machine Learning implementations and you can decide the optimum values as you get some initial output.
Outliers Detected
4- DBSCAN will leave out samples that don't fit the density rules
Unlike K-Means clustering algorithm, DBSCAN won’t include every single sample in the clusters formed in the end. If some of the samples don’t fit the distance and minimum sample rules these will be left out as a noise point. This working principal can be very useful in detecting outliers as well as leaving out unwanted noise in the data.
Summary
Recommended Articles: