K-Means Disadvantages
Random Forest Advantages by far outweighs Random Forest Disadvantages. We compiled a small list of Random Forest’s shortcomings and it can be useful to know these factors for an improved practical experience with Random Forests and more aligned expectations.
In this article we will elaborate on disadvantages of Random Forests and we will also share some points on how to address them through tuning.
1- Local Minima
With K-Means algorithm there is a lilkelihood of running into local minima phenomenon. Local minima is when the algorithm mathematically gets stuck in a local end point although it should have continued past it.
The consequence can be occasional wrong clusters.
2- Results can vary
Based on the centroids that are initiated K-Means can generate varying outputs.
Ending up with different clusters can be annoying if you are relying on K-Means results for your project.
This situation can potentially be fixed with optimizing K-Means model parameters.
Please see Tuning K-Means Algorithm for more details.
3- Numerical only
K-Means can only cluster datasets with numerical data.
If data is categorical () K-Means clustering will not work.
This has implications of course as it limits the use cases for K-Means unsupervised machine learning algorithm.
Many Options for Customization
4- Normalization Required
K-Means also needs data normalization. If dataset features are not scaled K-Means will perform with skewness and bias producing less competent results.
This is beacause K-Means algorithm relies on distance calculations between observation points and centroids to assign samples to different clusters.
Can't cluster overlapping subgroups.
5- Only Neighboring Clusters
Another potential shortcoming for K-Means is that it will only cluster neighboring subgroups. But what if you have a subgroup inside another group? K-Means won’t be able to perform such clustering and you will need another clustering algorithm like DBSCAN.
Not ideal for anomaly detection
6- Equal treatment for outliers
K-Means algorithm will assign every sample point to a cluster. This can be troublesome if you are trying to detect anomalies and analyze outliers because even they will be included in a cluster. For projects involving outliers DBSCAN can provide a better solution.
Usually covers spherical clusters
7- Can't cluster arbitrary shapes
In most cases K-Means algorithm will end up with spherical clusters based on how it works and harvests distance calculations surrounding centroid points. However in real world examples it’s also possible to see arbitrary shapes. Imagine medical data that’s clusters in crescent shape. Such arbitrary shapes wouldn’t be possible to identify using K-Means and again a good alternative is offered by density based clustering algorithm DBSCAN.
Simple and useful for clustering but with reservations.
Summary
We have tried to list the potential shortcomings and pitfalls when working with K-Means unsupervised machine learning algorithm. K-Means is a very simple and fast algorithm for clustering jobs however it can be tricky to use based on some of the points listed above.
If you have numerical data without labels you will definitely enjoy insights that appear to be coming from nowhere sometimes using K-Means algorithm.
If you are not sold on K-Means yet, definitely check out the article where we elaborate on many benefits of working with K-Means Clustering Models: