Data Scientists Get Ranked –

I’ve never had a Google account, and I pay to use e-mail […] I know how much information they can collect, and how much people can learn from your clicks.

by Sergey Yurgenson, one of Kaggle’s top competitors as listed in this article from the, Data Scientists Get Ranked. The article does also list some of the similarities among the top Kaggle competitors.

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek

Amazon appears to believe that in about 20 years, nearly all enterprises will run their computing systems in the cloud. I would have to agree with them. This article is worth a look, especially the paragraph about Pinterest running completely on AWS.

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek.

Data Visualization Links

I recently ran across the following articles about data visualization.

Good visualizations are an important part of the storytelling for data science.

edX – Online Education From Harvard And MIT

Just yesterday, MIT and Harvard University announced a new partnership to offer online education. The goal is to increase learning for students on-campus and others throughout the globe. Both schools plan to study the results of edX to better understand how students learn and how technology affects learning.

See the official announcement here.

EdX Video Announcement

How will this affect Data Science Learning?

It is too early to know exactly what courses will be offered, but given MIT’s strength in engineering, those courses would seem reasonable. I am guessing (and hopeful) that many courses pertinent to data science will be offered by edX. Also, the announcement is most likely a response by MIT and Harvard to compete with Coursera, a company started by 2 Stanford University faculty. Obviously, the elite schools do not want to be outdone by each other. In any case, I only see these new and different methods for education as a good thing. Happy Learning!

The First Ever Data Scientist

It’s probably not who you think. It’s not DJ Patil or Hilary Mason. The first data scientist was Tobias Mayer. Who? Yeah, that’s exactly right, I had never heard of him either. Thankfully, John Rauser, a Data Scientist at Amazon, gave a great talk about this person at Strata New York 2011.

Well, Tobias was an astronomer way back in the mid 1700s. He spent a lot of time observing the libration (wobbling) of the moon, and he came up with the following formula:

\beta - x = y \alpha - z \alpha \sin{\theta}

He could measure x, y and z . Thus he needed to solve for \alpha, \beta and \theta. Given measurements from 3 observations and 3 equations, Tobias could solve for the unknown. That is when the real problem arose. Tobias had 27 observations instead of 3. He had too much data. This may have been the first known occurrence of big data. For more on Tobias Mayer’s solution, you will need to watch the video below. Hint: he strategically grouped the data.

Rauser has this to say about why Mayer qualifies as the first data scientist.

As far as I know, the first time in history that someone made a quantitative argument that more data is better.

Rauser doesn’t stop there though. The rest of his talk goes on to explain the path to becoming a data scientist and the necessary skillset. Below are the skills he mentions.

  • Math
  • Engineering
  • Writing
  • Skepticism
  • Curiousity

So, do yourself a favor, and take a few minutes to watch this great talk.

As I watched this video, I kept asking myself the same question. Why have I never seen this video before?