Random Trees offer the best of both worlds. Reliability, simplicity and low maintenance of decision trees, increased accuracy, decreased feature reliance and better generalization that comes from ensembling techniques.
K-Means is a fast algorithm. It doesn’t have a training process and its fast to cluster.
Aside of runtime performance K-Means is also scalable making it suitable to work with big data and large datasets.
K-Means has O(N*P*K) worst case time complexity for each iteration. This corresponds to something between linear and quadratic time complexity in most cases though if data is not high dimensional and usually closer to Linear Complexity which is very scalable. You can see article:
K-Means is a pretty simple algorithm to use. In most cases all you have to do is play with the n_clusters parameter and see how algorithm and dataset interacts based on centroid number.
You will be presented spherical clusters and that’s all there is to derive and interpret from a K-Means study.
There are a few other parameters that can be tuned for K-Means applications. You can see an article here:
K-Means algorithm will come up with clusters that include your whole dataset.
This mean every single sample will belong to a cluster.
Although some applications require different clustering characteristics, depending on the application this can be an advantage.
For example outlier detection or anomaly detection will require the algorithm to leave out certain “outlier” samples and DBSCAN might be a better alternative for those projects.
Suitable for non-linearity
4- Advantages of harvesting distance
K-Means will also work well with non-linear data since it is based on distance calculations between centroids and samples. All that’s needed is Euclidean distance calculations and linearity has no role in the process.
However, based on the same argument, K-Means will only work with numerical data and if dataset involves categorical samples K-Means won’t be able to perform distance calculations. See:
Popular clustering algorithm
In this article we have seen the advantages of K-Means algorithm. We mentioned how it’s fast, simple and useful for many applications. Due to its simplicity and scalability K-Means remains the most popular clustering algorithm.
We have also slightly covered some cases where K-Means might not be the first machine learning algorithm choice and you can read more about those points under K-Means Disadvantages.
Having covered these points, K-Means advantages can best be experienced through an example where K-Means is implemented. For that purpose you can check this article which demonstrates a K-Means implementation: