Tag Archives: data

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek

Amazon appears to believe that in about 20 years, nearly all enterprises will run their computing systems in the cloud. I would have to agree with them. This article is worth a look, especially the paragraph about Pinterest running completely on AWS.

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek.

Data Visualization Links

I recently ran across the following articles about data visualization.

Good visualizations are an important part of the storytelling for data science.

Github Is Cool: They Like Data

Today, GitHub announced the release of archived public activity data called the GitHub public timeline. The dataset can be queried via the Google BigQuery tool.

To make things even more awesome, GitHub is also hosting a Data Challenge. The challenge is to play around with data and create the best visualization possible. You better start now, because the competition ends May 21st. I am not familiar with Google BigQuery so this might be a good time to learn.

This should not surprise anyone. GitHub is always doing cool things, especially for developer-minded people. If you don’t know, GitHub is the best place for hosting your source code.

Large Scale Text Processing with MapReduce: A Free Textbook

Data-Intensive Text Processing with MapReduce is a Free online (PDF) textbook about text processing on large amounts of data. The 1st edition has been available for a couple of years, and a 2nd edition is in the works. Here is quick overview of some of the topics.

  • Mapreduce
  • Graph Algorithms
  • Text Processing

Happy Reading (and Text Processing)!

Big data: The next frontier for innovation, competition, and productivity | McKinsey & Company

Big Data: The next frontier for innovation, competition, and productivity | McKinsey Global Institute | Technology & Innovation | McKinsey & Company.

This report by McKinsey & Company is frequently referenced, so I thought I should post a link to it. It includes the following quote about the lack of talent to fill Big Data positions.

By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

This quote is why now is a great time to be learning to become a data scientist.