This week; Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, and Todd Mytkowicz will be presenting Yinyang K-means at the 2015 International Conference on Machine Learning.
The algorithm guarantees the same results as traditional K-means, but it produces results with an order of magnitude higher performance.
An abstract of the paper and a PDF download can be accessed at Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
A very nice slidedeck from Jeff Hammerbacher of Cloudera. It goes over k-means clustering and some enhancements.
A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.
- data mining
- recommender systems
- and more
The book is free for download, or available from Cambridge University Press.
Machine Learning: Algorithms that Produce Clusters | Architects Zone.
The above article provides a nice brief overview of 5 clustering algorithms.
- Hierarchical Clustering
- Fuzzy C-Means
- Multi-Gaussian with Expectation-Maximization
- Density-based Cluster
This goes well with a previous post about 6 Machine Learning Algorithms.