Yesterday, I posted about some traditional strategies to acquire data science skills. Today, I will post a nontraditional strategy.
There is hoards of data science information available on the internet for free. With enough personal motivation, a person could learn all the skills necessary for free (or cheap) online. Coursera is probably a great place to start. There are also other good sites such as Udacity, the Kaggle Wiki, other blogs and websites.
The problem with this approach is knowing exactly what to learn. A course in machine learning is great, but data science is more than just machine learning. How do you know what to learn? It would be really nice to have a collection of data science topics and the associated online training materials.
Would this strategy work for you?
Based upon the popularity of a previous post about a certificate program from the University of Washington, it appears that many people are interested in learning the skills necessary to become a data scientist. Thus, I decided to compile a list of some of the possible learning strategies.
Traditional College Education
The most obvious path would be to study at a traditional college or university. Colleges and universities are starting to notice the demand for data science skills, and many colleges are currently offering programs to prepare someone as a data scientist. This path is safe and predictable. Do the homework, complete the courses, and get the degree or certificate. Most people are familiar with the process, and it offers few surprises. The problems here are the costs, lack of flexibility, and time involved.
Companies are now starting to offer training programs for data science. EMC is leading the way in this category with their data science training program. Cloudera also offers lots of training related to hadoop and big data. Wolfram offers data science training with Mathematica. One of the problems with this category is the cost. Another problem is the companies have the tendency to teach and promote their own products. This may leave the student with numerous gaps in the full data science spectrum.
What are you thoughts about the above approaches? What are the positives and negatives? Also, later this week I will be posting some less-traditional approaches to learning data science.
Visit this excellent video interview with David Dietrich, creator of EMC’s data science curriculum. He talks about his experience helping people transition to becoming data scientists.
David lays out a list of 5 traits of a data scientist.
- Communication and Collaboration
- Creative and Curiosity
For a diagram of these 5 traits, see this brief writeup about the profile of a data scientist. Also, see the slides of his latest talk at EMC World 2012.
**Note: I removed the embedded video because it was set to automatically play the video
If you follow the blog, you probably know I am a big fan of Kaggle. Just last week, they announced the launch of 2 new products.
- Kaggle Recruit In this competition, the participants are not competing for a cash prize but rather a job interview with a specific company. Currently, Facebook is hosting the first such competition.
- Kaggle Prospect In this competition, the participants are trying to come up with the best question to ask. Participants are presented with various related datasets, and the goal is to find which data science question should be asked of the data. The winner gets a small cash prize, and the winning question becomes a regular kaggle competition.
What do you think? Are you excited to try out these new competitions?
I think this infographic sums it up fairly well. It is amazing all that happens on the internet in 60 seconds, Here are just a couple.
- 98,000 tweets
- 695,000 Facebook status updates
- 13,000 hours of music streamed on Pandora
See the infographic for more examples. It is no wonder the world is having bigdata problems.
Infographic by- Shanghai Web Designers
Starting in the fall of 2012, the University of Washington will be offering a certificate in Data Science. The program has two sections: one located in Seattle and the other online. The certificate consists of three separate courses each lasting approximately 3 months. Thus the program can be completed in 9 months, and the cost is around $3000.
There are some information sessions later this summer. If you are in Seattle, there is an information session on July 19. If you are interested in the online program, a webinar is scheduled for August 29.
The program content looks quite good. Some of the topics to be covered include: hadoop, NoSQL, machine learning, statistics, graph algorithms, and more. If you are looking to become a data scientist, this just might be the program for you.
I also added this certificate program to my list of College’s offering data science degrees.