Github Is Cool: They Like Data

Today, GitHub announced the release of archived public activity data called the GitHub public timeline. The dataset can be queried via the Google BigQuery tool.

To make things even more awesome, GitHub is also hosting a Data Challenge. The challenge is to play around with data and create the best visualization possible. You better start now, because the competition ends May 21st. I am not familiar with Google BigQuery so this might be a good time to learn.

This should not surprise anyone. GitHub is always doing cool things, especially for developer-minded people. If you don’t know, GitHub is the best place for hosting your source code.

Advertisements

Large Scale Text Processing with MapReduce: A Free Textbook

Data-Intensive Text Processing with MapReduce is a Free online (PDF) textbook about text processing on large amounts of data. The 1st edition has been available for a couple of years, and a 2nd edition is in the works. Here is quick overview of some of the topics.

  • Mapreduce
  • Graph Algorithms
  • Text Processing

Happy Reading (and Text Processing)!