The Elements of Statistical Learning textbook is available for free. It is a classic, widely-used textbooks for statistics and machine learning. Here is a far from complete list of some of the topics:
- Supervised Learning
- Linear/Logistic Regression
- Model Selection
- Neural Networks
- Support Vector Machines
- Random Forests
- Unsupervised Learning
As you can see, the book is quite extensive.
Note: This book has been available for a quite a while, but I realized I have not added a link to it on my blog.
David Barber, Computer Science Professor at University College London, is still offering his textbook, Bayesian Reasoning and Machine Learning, for free. This text looks quite extensive. The website also includes matlab code for many of the algorithms in the book.
A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.
- data mining
- recommender systems
- and more
The book is free for download, or available from Cambridge University Press.
Data-Intensive Text Processing with MapReduce is a Free online (PDF) textbook about text processing on large amounts of data. The 1st edition has been available for a couple of years, and a 2nd edition is in the works. Here is quick overview of some of the topics.
- Graph Algorithms
- Text Processing
Happy Reading (and Text Processing)!
Previously I mentioned that online statistics learning resources are not abundant.
Well, here is a new online book for learning statistics. It is geared towards programmers, and it looks to be a great fit for people wanting to learn data science. Here is a small excerpt from the Preface:
It emphasizes the use of statistics to explore large datasets.
I have only had time to quickly browse the book, but it looks quite good.
Think Stats: Probability and Statistics for Programmers
(The book has a Creative Commons license, so it is free and OK to download)