Yoshua Bengio, Ian Goodfellow and Aaron Courville are writing a deep learning book for MIT Press. The book is not yet complete, but the drafts of the chapters are all available online. The authors are also collecting comments about the chapters before the book goes to press.

The book is broken into 3 sections:

- Math and Machine Learning Fundamentals
- Modern Deep Neural Networks
- Current Research in Deep Learning

The book is very technical and probably suitable for a graduate level course. However, if you have the time and interest, resources such as this are highly valuable.

O’Reilly is releasing a free early (unedited) edition of the book, Graph Databases. The book is authored by Ian Robinson, Jim Webber, and Emil Eifrém. All three authors are members of Neo Technology, the maker of the super-popular Neo4j graph database.

This is an online, HTML version of the book, Natural Language Processing with Python. The book is a companion for NLTK which is a free, open source toolkit, written in python, for Natural Language Processing (NLP).

Cosma Shalizi of the Statistics Department at Carnegie Mellon University is working on an Advanced Data Analysis from an Elementary Point of View textbook. A copy of the textbook will remain freely available on the website. Since the textbook is still being created, comments are welcome.

The Elements of Statistical Learning textbook is available for free. It is a classic, widely-used textbooks for statistics and machine learning. Here is a far from complete list of some of the topics:

- Supervised Learning
- Linear/Logistic Regression
- Regularization
- Model Selection
- Trees
- Neural Networks
- Support Vector Machines
- Random Forests
- Unsupervised Learning
- Clustering

As you can see, the book is quite extensive.

Note: This book has been available for a quite a while, but I realized I have not added a link to it on my blog.

David Barber, Computer Science Professor at University College London, is still offering his textbook, Bayesian Reasoning and Machine Learning, for free. This text looks quite extensive. The website also includes matlab code for many of the algorithms in the book.

A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word *bigdata*. Below is a list of some of the topics covered in the textbook.

- data mining
- map-reduce
- clustering
- recommender systems
- and more

The book is free for download, or available from Cambridge University Press.

Data-Intensive Text Processing with MapReduce is a Free online (PDF) textbook about text processing on large amounts of data. The 1st edition has been available for a couple of years, and a 2nd edition is in the works. Here is quick overview of some of the topics.

- Mapreduce
- Graph Algorithms
- Text Processing

Happy Reading (and Text Processing)!

Previously I mentioned that online statistics learning resources are not abundant.

Well, here is a new online book for learning statistics. It is geared towards programmers, and it looks to be a great fit for people wanting to learn data science. Here is a small excerpt from the Preface:

It emphasizes the use of statistics to explore large datasets.

I have only had time to quickly browse the book, but it looks quite good.

Think Stats: Probability and Statistics for Programmers

(The book has a Creative Commons license, so it is free and OK to download)

## Learning To Be A Data Scientist