May Strata 2012 will occur online this year. The cost is zero, and the event is tomorrow (May 16, 2012). The only catch is that you must register first. The entire conference is scheduled to take place in the morning, so the format looks quick. Judging from other Strata videos I have seen, I would guess this will be an event of high quality.
Thanks to DataGeeks-MSP for alerting me to the conference.
We can no longer continue to build development projects without thinking about data, anymore than we could without thinking about a budget.
By Jake Porway of DataKind.
This is another good video about the the social problems that DataKind is hoping to solve.
Source: Code For America: Big Data For The Public Good
Healthcare is starting to see the value of data science. Here are 2 data science events aimed at generating value for healthcare.
In this video, Jeff Hammerbacher of Cloudera mentions that good data scientists are “data rats.” Athletes are often considered “gym rats” if they spend a lot of time in the gym, so Jeff believes “data rats” need to spend a lot of time with data. Having a high level of curiosity is very important.
Jeff also teaches an introductory course in Data Science at Berkeley. In the course, he tries to cover 5 skills that are not typically covered in an undergraduate curriculum.
- Data Collection and Integration – know how to acquire and integrate data
- Visualization Design – not just chart design but entire dashboard design
- Large-scale Experimentation – rapidly design and deploy features to be tested
- Causal Inference – you don’t get to design the studies, you just deal with the data
- Data Products – how to deploy and evaluate a machine learning algorithm
I’ve never had a Google account, and I pay to use e-mail […] I know how much information they can collect, and how much people can learn from your clicks.
by Sergey Yurgenson, one of Kaggle’s top competitors as listed in this article from the NewYorkTimes.com, Data Scientists Get Ranked. The article does also list some of the similarities among the top Kaggle competitors.
Amazon appears to believe that in about 20 years, nearly all enterprises will run their computing systems in the cloud. I would have to agree with them. This article is worth a look, especially the paragraph about Pinterest running completely on AWS.
Amazon: Era Of Data Centers Ending – Cloud-computing – Infrastructure as a Service – Informationweek.
I recently ran across the following articles about data visualization.
Good visualizations are an important part of the storytelling for data science.
- What types of media store the most data?
- Where are the world’s 10 largest data centers?
Check out this infographic for the answers.
See the source at Mozy
Just yesterday, MIT and Harvard University announced a new partnership to offer online education. The goal is to increase learning for students on-campus and others throughout the globe. Both schools plan to study the results of edX to better understand how students learn and how technology affects learning.
See the official announcement here.
EdX Video Announcement
How will this affect Data Science Learning?
It is too early to know exactly what courses will be offered, but given MIT’s strength in engineering, those courses would seem reasonable. I am guessing (and hopeful) that many courses pertinent to data science will be offered by edX. Also, the announcement is most likely a response by MIT and Harvard to compete with Coursera, a company started by 2 Stanford University faculty. Obviously, the elite schools do not want to be outdone by each other. In any case, I only see these new and different methods for education as a good thing. Happy Learning!