DBSCAN Disadvantages
Like all Machine Learning algorithms DBSCAN also has a few characteristics which can make it less desirable in certain situations.
Epsilon (ε): A distance measure that will be used to locate the points/to check the density in the neighbourhood of any point.
minPoints(n): The minimum number of points (a threshold) clustered together for a region to be considered dense.
1- Limited Scalability
DBSCAN algorithm has O(N^2) Quadratic Time Complexity. This provides a challenging computational situation where DBSCAN will get more and more sluggish as the dataset gets larger and/or dimensional.
This is particularly troublesome when you have to work with big data or lots of features (high-dimensionality). One way to overcome performance issues in DBSCAN implementations is parallelization using n_jobs parameter.
See our guide for Tuning DBSCAN.
2- Parameter Work
Although usually beneficial, the obligation of adjusting the parameters and getting them right can be cumbersome in some clustering applications.
Particularly distance parameter (epsilon) and minimum sample parameter (min_samples) are necessary for initialization and will often need to be adjusted according to the dataset and project needs.
See our guide for DBSCAN Optimization.
3- No No
Does what it does quite well: Clustering Arbitrary Shapes
DBSCAN Disadvantages Summary
In this article, we’ve discussed the potential drawbacks of the DBSCAN algorithm. Like all algorithms, DBSCAN has many pros and cons. The mastery comes with knowing how to efficiently use them in different use-case scenarios.
DBSCAN is a very special algorith with many benefits. You can check out its benefits here: