How To Learn Data Science? Part 2

Yesterday, I posted about some traditional strategies to acquire data science skills. Today, I will post a nontraditional strategy.

Internet Based

There is hoards of data science information available on the internet for free. With enough personal motivation, a person could learn all the skills necessary for free (or cheap) online. Coursera is probably a great place to start. There are also other good sites such as Udacity, the Kaggle Wiki, other blogs and websites.

The problem with this approach is knowing exactly what to learn. A course in machine learning is great, but data science is more than just machine learning. How do you know what to learn? It would be really nice to have a collection of data science topics and the associated online training materials.

Would this strategy work for you?

7 thoughts on “How To Learn Data Science? Part 2”

  1. I took ml-class and learned a lot, but I don’t feel like it necessarily helped qualify myself as a data scientist to potential employers. Udacity doesn’t offer anything at the moment that even comes close to “big data.” At this point I’d say just diving into Kaggle and trying to pick things up as you go is probably the best bet, but that’s a slow process to gain competency.

    1. Ken,
      Agreed on the ml-class. Udacity is offering a social network analysis class, but they don’t offer much data science other than that. Kaggle is a great way to learn, but it is slow and it can be difficult knowing which competitions to enter first. The internet has so much information, but knowing what information to access and in what order can be problematic. Thanks for commenting.

  2. This is the course I’m currently pursuing, but I would agree that it would be great to have a little more direction as to what Coursera/Udacity/Etc. classes are worth taking.

    I’m taking the Coursera Machine Learning course, and I plan to take Natural Language Processing, Probabilistic Graphical Models, and probably the Computer Vision course in the fall, while supplementing my computer science knowledge with some relevant Udacity and Coursera classes.

    What else is out there?

    1. Good question. When I took the ml-class last fall, random forests were not even discussed. Kaggle claims they are one of the better techniques for many of the competitions. Thus, to learn about random forests, I had to use google, wikipedia, youtube, and a few books.
      Here is another problem with the online courses. They focus mainly on the algorithms once the data has been collected and nicely formatted. Collecting and formatting the data is an important part of data science. Also, Coursera and Udacity only touch on about 2 of the 5 circles from this profile of a data scientist, http://infocus.emc.com/david_dietrich/what-is-the-profile-of-a-data-scientist/
      The courses are really great, but they only cover certain aspects of becoming a data scientist.

  3. Good articles on sources for self-study. I am looking into Data Science to dust off my former machine learning and AI skills from 20 years ago when I graduated from college.

    I am underwhelmed with Udacity. The Intro to Statistics course is super-basic. The Khan-Academy style showing of short video-snippets and then multiple-choice questions may draw some audience in, but I think it’s too simplistic.

    What I have seen from Coursera seems hands-on and geared towards industrial-strength large-scale Hadoop deployments. Not quite sure how interactive this gets.

    Wolfram’s Data Science course is very good, although obviously building on the Mathematica platform. However, the abilities to handle sparse matrices, work with distributions, do visualizations, access curated data feeds etc. are second to none. As a learning environment (with affordable student or home licenses)this is excellent.

    I am seriously considering EMC’s data science course, although the price tag is a deterrent. Has anyone here (outside of EMC) achieved this certification? Thoughts?

    Lastly, I have signed up with Kaggle and start looking into my first competition there. Just submitting a solution will teach you more on the data handling and algorithms than most of the online courses.

    It’s a bit like with a triathlon: You can read books and watch or listen to the pros. But nothing gets you the feel of this sport quite like competing and finishing in a real race.

    1. Yes, learning data science is somewhat tricky right now. I have not taken the EMC course. I am glad you like the Wolfram materials. That is good to know. Kaggle is really fun, but it can be difficult to know exactly what to learn in order to inprove your model. The forums sometimes help with that. Thanks for commenting.

Leave a Reply

Your email address will not be published. Required fields are marked *