Profile Of A Data Scientist – Interview

Visit this excellent video interview with David Dietrich, creator of EMC’s data science curriculum. He talks about his experience helping people transition to becoming data scientists.
David lays out a list of 5 traits of a data scientist.

  • Quantitative
  • Technical
  • Skeptical
  • Communication and Collaboration
  • Creative and Curiosity

For a diagram of these 5 traits, see this brief writeup about the profile of a data scientist. Also, see the slides of his latest talk at EMC World 2012.

**Note: I removed the embedded video because it was set to automatically play the video

Kaggle Launches New Products

If you follow the blog, you probably know I am a big fan of Kaggle. Just last week, they announced the launch of 2 new products.

  1. Kaggle Recruit In this competition, the participants are not competing for a cash prize but rather a job interview with a specific company. Currently, Facebook is hosting the first such competition.
  2. Kaggle Prospect In this competition, the participants are trying to come up with the best question to ask. Participants are presented with various related datasets, and the goal is to find which data science question should be asked of the data. The winner gets a small cash prize, and the winning question becomes a regular kaggle competition.

What do you think? Are you excited to try out these new competitions?

Why The Bigdata?

I think this infographic sums it up fairly well. It is amazing all that happens on the internet in 60 seconds, Here are just a couple.

  • 98,000 tweets
  • 695,000 Facebook status updates
  • 13,000 hours of music streamed on Pandora

See the infographic for more examples. It is no wonder the world is having bigdata problems.

60 Seconds - Things That Happen On Internet Every Sixty Seconds
Infographic by- Shanghai Web Designers

New Data Science Certificate Program

Starting in the fall of 2012, the University of Washington will be offering a certificate in Data Science. The program has two sections: one located in Seattle and the other online. The certificate consists of three separate courses each lasting approximately 3 months. Thus the program can be completed in 9 months, and the cost is around $3000.

There are some information sessions later this summer. If you are in Seattle, there is an information session on July 19. If you are interested in the online program, a webinar is scheduled for August 29.

The program content looks quite good. Some of the topics to be covered include: hadoop, NoSQL, machine learning, statistics, graph algorithms, and more. If you are looking to become a data scientist, this just might be the program for you.

I also added this certificate program to my list of College’s offering data science degrees.

Who's Hiring Data Scientists in June 2012?

Well, here is a slide show profiling 8 companies that are doing just that.

8 companies hiring data scientists right now

The list includes some big names like Google and Facebook, but there are some surprises too.

This is a great overview of bias and variance.

Network World Article: Could data scientist be your next job?

Sandra Gittlen wrote a very nice article for Network World titled Could data scientist be your next job? She did an excellent job explaining the problem with defining exactly what a data scientist is. She also interviewed a couple people leading the push to get universities to provide data scientist training. According to her article and many others on the internet, it is a great time to be learning data science skills. Companies cannot find enough data scientists. Plus, Sandra was kind enough to interview me for the article.

Visualization Of Data Science Twitter Users

This is a fun and interactive visualization of 659 twitter accounts linked to data science.

Data Science Summit Videos

The videos for the 2012 Data Science Summit organized by Greenplum are now available.

See the videos

Bitmarks: Bitly's Data Science One URL At A Time

Earlier this week, Bitly launched a new bookmarking service. They call links/URLs bitmarks instead of bookmarks. It has a nice Chrome Extension and Bitmarklet. So far, I very much like the service.

So, Why Should You Care?

Well, at its core, Bitly is a data science company. This is just another way for Bitly to collect more URLs. I think that is a good thing. Bitly has huge amounts of data created by collecting lots and lots of small things.

What do you think Bitly is doing with all those URLs? I am not completely sure, but I would bet some of it is really neat. Bitly can already track breaking news in near real-time. I will be curious if Bitly can predict the winner of the November presidential election before the news organizations can.

By the way

I have create a Data Science 101 Bitmark Bundle. You are welcome to follow along, although I do not know if there is a way to follow a bundle.