Kaggle Makes Data Science Fun

The tag line for Kaggle is “We’re making data science a sport.”  They have successfully created a way to turn data science into a competition.  It is both fun, and it yields excellent results.  There is also a portion of the site dedicated for classroom use.  It is called Kaggle in Class.

Here is how it works.  A company that needs some data analyzed can contact Kaggle and host a competition.  Then data scientists all over the world can compete to find the best solution. The company benefits from having many algorithms and techniques applied to the same data set.  Many more algorithms are applied than what the company could run without Kaggle.  The contestants benefit from networking, pre-cleaned data, and learning from others.  It is a win/win situation. Plus, the winner gets prize money.

Currently, the large featured competition is the Heritage Health Prize. It is a $3,000,000 competition to identify individuals that will be admitted to the hospital in the next year.  The contest lasts until April 2013.

This is definitely a site I want to be involved with in the future.  I just wish they could make it a spectator sport as well.

Visually Launches

Yesterday, Visual.ly launched a new way of creating infographics.  It claims to be able to “create free customizable infographics in seconds.” I have not tried it yet, but it is on my todo list.

Natural Language Processing Starts Today

The Coursera Natural Language Processing course officially starts today.  Sign up and start learning.

16 Companies Hiring Data Scientists Right Now

Data Scientist is the hot new job for 2012.  Does this job really exist?  Who hires these people? Are companies currently hiring? The answers are: yes, lots of companies, and yes. I decided to spend last night looking for companies that are currently hiring data scientists.  It did not take long to compile a pretty good list.

Data Scientist Job Openings

Company Location Link
Microsoft Redmond, WA Microsoft Sr. Data Scientist
Netflix Los Gatos, CA NetFlix Senior Data Scientist
Kaggle San Francisco, CA Kaggle Data Scientist
Greenplum San Mateo, CA Greenplum Data Scientist
Last.fm London Last.fm Data Scientist
Rackspace San Antonio, TX Rackspace Data Scientist
Amazon Seattle, WA Amazon Data Scientist/System Architect
Facebook Menlo Park, CA Facebook Data Scientist
Twitter San Francisco, CA Twitter Data Scientist
LinkedIn Mountain View, CA LinkedIn Data Scientist
Cobalt/ADP Cambridge, MA Cobalt Data Scientist
Ebay/Paypal San Jose, CA Paypal Data Scientist
Bunchball San Jose, CA Bunchball Data Scientist
A9 Palo Alto, CA Principal Engineer/Data Scientist
Acxiom Little Rock, AR Acxiom Data Scientist
Trulia San Francisco, CA Trulia Data Scientist – Data Science Lab

Do you know of any other companies hiring Data Scientists right now?

Learning Statistics for Data Science

Statistics – This is a topic that could use some more attention from the online community.
I would love to see Stanford (or Coursera) offer a free statistics course online much like the other free courses online.

I did find a series of Youtube videos by Daniel Judge, a Professor in the East Los Angeles College Mathematics Department. The videos start at the very beginning of statistics. I have watched a couple of the videos, and they seem quite good. Daniel does a nice job of explaining the information. Here is the first video in the series.

Stay tuned to the blog in case other stats options appear online. Also, please leave a comment if you know of some good online statistics resources.

Data Scientist – Career of the Future

This link provides a great Infographic about data scientist career opportunities.

5 Low-Profile Big Data Startups

Here are 5 startups that are doing some serious data science. BloomReach and Skytree have already launched this year.

More Free Courses from Stanford

Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.

If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes.  If you are new to computer science and programming, CS 101 would be a good choice.  If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.

Actually, the courses are no longer being offered by just Stanford.  A few others schools have been added.  The courses are now being offered through Coursera. Plus all the courses are free.

Science will use gaming for crunching data

This is an interesting article about how science will use online-gaming to solve some of sciences most difficult questions. Players think they are just solving puzzles.

What Makes a Good Data Scientist?

Jeremy Howard is the Chief Scientist at Kaggle. At the end of this interview, from the Strata Conference 2012, he identified 4 simple traits that a data scientist needs.

  1. Creativity
  2. Open-mindedness
  3. Tenacity
  4. A Good Skillset

Jeremy Howard of Kaggle at Strata 2012

In this brief interview he covers a range of other data science topics:

  • Big Data is an engineering problem
  • Analytics generate value/insight from data
  • Predictive Modeling is about answering a question – build a model to do that
  • Is Data Science about tools or people? – watch the video for Jeremy’s answer
  • And others…

See this previous post for more videos from Strata 2012.

Learning To Be A Data Scientist