I’ve never had a Google account, and I pay to use e-mail […] I know how much information they can collect, and how much people can learn from your clicks.
by Sergey Yurgenson, one of Kaggle’s top competitors as listed in this article from the NewYorkTimes.com, Data Scientists Get Ranked. The article does also list some of the similarities among the top Kaggle competitors.
Amazon appears to believe that in about 20 years, nearly all enterprises will run their computing systems in the cloud. I would have to agree with them. This article is worth a look, especially the paragraph about Pinterest running completely on AWS.
I recently ran across the following articles about data visualization.
- iPad for Visualizations – Microstrategy has created an iPad app thats can be used for the analysis of bigdata.
- Big Picture of BigData – How visualization can be applied to the 3 Vs of bigdata (Volume, Velocity, Variety)
- The Explanatory Power of Data Points – Data points are important, but so is a story.
Good visualizations are an important part of the storytelling for data science.
- What types of media store the most data?
- Where are the world’s 10 largest data centers?
Check out this infographic for the answers.
See the source at Mozy
Just yesterday, MIT and Harvard University announced a new partnership to offer online education. The goal is to increase learning for students on-campus and others throughout the globe. Both schools plan to study the results of edX to better understand how students learn and how technology affects learning.
See the official announcement here.
EdX Video Announcement
How will this affect Data Science Learning?
It is too early to know exactly what courses will be offered, but given MIT’s strength in engineering, those courses would seem reasonable. I am guessing (and hopeful) that many courses pertinent to data science will be offered by edX. Also, the announcement is most likely a response by MIT and Harvard to compete with Coursera, a company started by 2 Stanford University faculty. Obviously, the elite schools do not want to be outdone by each other. In any case, I only see these new and different methods for education as a good thing. Happy Learning!
It’s probably not who you think. It’s not DJ Patil or Hilary Mason. The first data scientist was Tobias Mayer. Who? Yeah, that’s exactly right, I had never heard of him either. Thankfully, John Rauser, a Data Scientist at Amazon, gave a great talk about this person at Strata New York 2011.
Well, Tobias was an astronomer way back in the mid 1700s. He spent a lot of time observing the libration (wobbling) of the moon, and he came up with the following formula:
He could measure and . Thus he needed to solve for and . Given measurements from 3 observations and 3 equations, Tobias could solve for the unknown. That is when the real problem arose. Tobias had 27 observations instead of 3. He had too much data. This may have been the first known occurrence of big data. For more on Tobias Mayer’s solution, you will need to watch the video below. Hint: he strategically grouped the data.
Rauser has this to say about why Mayer qualifies as the first data scientist.
As far as I know, the first time in history that someone made a quantitative argument that more data is better.
Rauser doesn’t stop there though. The rest of his talk goes on to explain the path to becoming a data scientist and the necessary skillset. Below are the skills he mentions.
So, do yourself a favor, and take a few minutes to watch this great talk.
As I watched this video, I kept asking myself the same question. Why have I never seen this video before?