Free Textbook: Mining of Massive Datasets

A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.

  • data mining
  • map-reduce
  • clustering
  • recommender systems
  • and more

The book is free for download, or available from Cambridge University Press.


Olympic Numbers Infographic

As the Olympics are coming to a close, here is one more infographic. There are a lot of nice numbers here. The athlete caloric intake section is fun. Michael Phelps must be eating all the time. There is also a section about Acer computers. Acer installed 11,000 computers and 900 servers. Other than competition results, what other data was being collected? I would love to hear more about that Do you have any idea about what other data is collected at the olympics?

New Web Intelligence and Big Data Coursera Class

Thanks to Ed for leaving the comment yesterday. I have reposted the comment here because I thought it was so good.

Looks like Coursera added a new data science course entitled “Web Intelligence and Big Data” while nobody was looking! Plus, it starts at the end of the month, for those who can’t wait until the UW Intro to Data Science course to be scheduled.

Here is a link to the Coursera Web Intelligence and Big Data Course. The course is looking to focus on map-reduce and parallel programming applied to data problems.

Neo4j and Bioinformatics Webinar

Neo Technology, the company behind the graph database Neo4j, is hosting a webinar on Thursday. Pablo Pareja from the Bio4j project will provide an overview of bioinformatics and neo4j, as well as some applications.

Bioinformatics can be viewed as data science for biology. Bioinformatics was cool before data science was even a term.

If you are interested in learning more about bioinformatics and graph databases, the register for this webinar and start learning.

This will probably be a good series to follow. What machine learning options are available?

The Official Blog of

Hi, I’m Nick the intern. The fine folks at BigML brought me on board for the summer to drink their coffee, eat their snacks, and compare their service to similar offerings from other companies. I have a fair amount of software engineering experience but limited machine learning skills beyond some introductory classes. Prior to beginning this internship, I had no experience with the services I am going to talk about. Since BigML aims to make machine learning easy for non-experts like myself, I believe I am in a great position to provide feedback on these types of services. But please, take what I say with a grain of salt. I’ll try to stay impartial but it’s not easy when BigML keeps dumping piles of money and BigML credits on my doorstep to ensure a favorable outcome.

From my time at BigML, it has become clear that everyone here is a big…

View original post 473 more words