Data mining is the process of discovering these patterns among the data and is therefore also known as Knowledge Discovery from Data (KDD).
This week; Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, and Todd Mytkowicz will be presenting Yinyang K-means at the 2015 International Conference on Machine Learning. The algorithm guarantees the same results as traditional K-means, but it produces results with an order of magnitude higher performance. An abstract of the paper and a PDF download … Continue reading Yinyang K-Means: A Drop-In Replacement of the Classic K-Means
A very nice slidedeck from Jeff Hammerbacher of Cloudera. It goes over k-means clustering and some enhancements. 20130521mlmeetup from Jeff Hammerbacher
A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. … Continue reading Free Textbook: Mining of Massive Datasets
Machine Learning: Algorithms that Produce Clusters | Architects Zone. The above article provides a nice brief overview of 5 clustering algorithms. K-Means Hierarchical Clustering Fuzzy C-Means Multi-Gaussian with Expectation-Maximization Density-based Cluster This goes well with a previous post about 6 Machine Learning Algorithms.