A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.
- data mining
- recommender systems
- and more
The book is free for download, or available from Cambridge University Press.
Jeffrey M. Stanton, member of Syracuse University’s iSchool, just released an open-source ebook about data science. Obviously this book is intended to be used in the curriculum for the new Data Science Certificate Program. In particular, it will be used for two courses on analytics and visualization.
The book is available in the iTunes store or as a PDF. See the book website to get your copy.
Previously I mentioned that online statistics learning resources are not abundant.
Well, here is a new online book for learning statistics. It is geared towards programmers, and it looks to be a great fit for people wanting to learn data science. Here is a small excerpt from the Preface:
It emphasizes the use of statistics to explore large datasets.
I have only had time to quickly browse the book, but it looks quite good.
Think Stats: Probability and Statistics for Programmers
(The book has a Creative Commons license, so it is free and OK to download)