Free Online Strata Conference

May Strata 2012 will occur online this year. The cost is zero, and the event is tomorrow (May 16, 2012). The only catch is that you must register first. The entire conference is scheduled to take place in the morning, so the format looks quick. Judging from other Strata videos I have seen, I would guess this will be an event of high quality.

Thanks to DataGeeks-MSP for alerting me to the conference.

Data For Good

We can no longer continue to build development projects without thinking about data, anymore than we could without thinking about a budget.

By Jake Porway of DataKind.

This is another good video about the the social problems that DataKind is hoping to solve.
Source: Code For America: Big Data For The Public Good

Healthcare, BigData, and Startups

Healthcare is starting to see the value of data science. Here are 2 data science events aimed at generating value for healthcare.

Call for Startups – HealthStartup III on Big Data This is an event for connecting healthcare, startups and investors in Europe.
Health 2.0’s Boston Big-Data Code-a-thon This event starts today. It is a competition to develop some application for the healthcare industry. The application must use bigdata, and the teams have 2 days.

Be A Data Rat

In this video, Jeff Hammerbacher of Cloudera mentions that good data scientists are “data rats.” Athletes are often considered “gym rats” if they spend a lot of time in the gym, so Jeff believes “data rats” need to spend a lot of time with data. Having a high level of curiosity is very important.

Jeff also teaches an introductory course in Data Science at Berkeley. In the course, he tries to cover 5 skills that are not typically covered in an undergraduate curriculum.

  1. Data Collection and Integration – know how to acquire and integrate data
  2. Visualization Design – not just chart design but entire dashboard design
  3. Large-scale Experimentation – rapidly design and deploy features to be tested
  4. Causal Inference – you don’t get to design the studies, you just deal with the data
  5. Data Products – how to deploy and evaluate a machine learning algorithm

Data Scientists Get Ranked – NYTimes.com

I’ve never had a Google account, and I pay to use e-mail […] I know how much information they can collect, and how much people can learn from your clicks.

by Sergey Yurgenson, one of Kaggle’s top competitors as listed in this article from the NewYorkTimes.com, Data Scientists Get Ranked. The article does also list some of the similarities among the top Kaggle competitors.

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek

Amazon appears to believe that in about 20 years, nearly all enterprises will run their computing systems in the cloud. I would have to agree with them. This article is worth a look, especially the paragraph about Pinterest running completely on AWS.

Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek.

Data Visualization Links

I recently ran across the following articles about data visualization.

Good visualizations are an important part of the storytelling for data science.

Where is the World's Data being Stored?

  • What types of media store the most data?
  • Where are the world’s 10 largest data centers?

Check out this infographic for the answers.

Where is the World's Data Being Stored?

See the source at Mozy

edX – Online Education From Harvard And MIT

Just yesterday, MIT and Harvard University announced a new partnership to offer online education. The goal is to increase learning for students on-campus and others throughout the globe. Both schools plan to study the results of edX to better understand how students learn and how technology affects learning.

See the official announcement here.

EdX Video Announcement

How will this affect Data Science Learning?

It is too early to know exactly what courses will be offered, but given MIT’s strength in engineering, those courses would seem reasonable. I am guessing (and hopeful) that many courses pertinent to data science will be offered by edX. Also, the announcement is most likely a response by MIT and Harvard to compete with Coursera, a company started by 2 Stanford University faculty. Obviously, the elite schools do not want to be outdone by each other. In any case, I only see these new and different methods for education as a good thing. Happy Learning!