A while back James Kobielus wrote the article, Data Scientist: Consider the Curriculum. It contains one of the best descriptions of a data science curriculum I have seen. Also the article includes a list of algorithms/modeling techniques that should be known by a data scientist. Below is the list from the article.
- linear algebra
- basic statistics
- linear and logistic regression
- data mining
- predictive modeling
- cluster analysis
- association rules
- market basket analysis
- decision trees
- time-series analysis
- machine learning
- Bayesian and Monte Carlo Statistics
- matrix operations
- text analytics
- primary components analysis
- experimental design
- unsupervised learning
- constrained optimization
The list almost looks overwhelming.
Do you think anything is missing from the list?
After creating the lists Top 5 Data Science Gals and Top 5 Data Science Guys, I noticed a couple commonalities.
- 7 of the 10 (assuming Drew finishes) have PhDs
- 5 of the 10 are from the New York City area
Did you notice anything I missed?
- LinkedIn They turn data into products better than anyone else.
- Facebook If you are the type of person that loves to analyze people’s lives, there is no better place.
- Twitter Duh, It’s Twitter. lots of data and lots of possibilities
- Cloudera Cloudera is a successful Hadoop-based startup. Build tools and explore huge datasets for a variety of industries.
- Kaggle If optimizing algorithms and really diving into the data to get every last ounce of information is your thing, then Kaggle is it. Plus, there is nowhere else you will get to work on so many important problems in such a wide range of domains. Unfortunately, Kaggle is not currently hiring any data scientists, but they most likely will be seeking more in the future.
There are many other companies hiring data scientists. Where would you like to be a data scientist?
Coursera has some excellent courses coming up in 2013. Here are some potential curriculum paths for someone looking to learn data science.
Either sequence requires/recommends some basic programming experience. If you are unfamiliar with programming, you still have a couple weeks to get familiar with some basic programming concepts. Some good places to start would be either Coursera’s Computer Science 101 or Codecademy’s Python tutorial.
Data Science Curriculum #1
If you are new to programming, this would be the recommend sequence. The first course focuses on programming.
Data Science Curriculum #2
Neither of the Coursera machine learning (Stanford or U of Washington) courses are scheduled for 2013, but either of them would be a great (maybe necessary) follow up course. Hopefully, one of those courses will be starting in July or shortly there after.
After completing one of the above sequences combined with a machine learning course, a person should be skilled enough to begin doing useful data science work. (Note: A new job as a data scientist is not guaranteed, but the courses won’t hurt your chances.) Plus, Coursera offers numerous other classes that could be taken at a later time to increase depth in certain areas of data science (Natural Language Processing, Image Processing, and more).
Happy Learning in 2013!
If you are interested in more ways to learn data science, please check out Data Science 201, coming in 2013.
This is a very quick and informative video about data science. What is data science? What makes a good data scientist?
DJ Patil does an excellent job answering both those questions.
Here are his answers for what makes a good data scientist:
- Story Telling
When telling friends and family that I blog about data science, I am frequently asked to explain more. I usually respond with an answer similar to this:
You know the world is generating huge amounts of data everyday due to financial transactions, medical records, social networks, and other internet uses. Data Science aims to make better decisions based upon that data. Here are some possibilities. What type of people buy TVs in October? Which patients will get better with this new drug? Who are some other people that you probably already know?
Data Science is all about answering these types of questions with real data instead of assumptions.
I think this explanation could use some refinement. What am I leaving out? What should I remove? How do you explain data science to other people (preferably non-technical or non-data people)?
This is very good read about data science at Engine Yard. It covers the following topics:
- What is a data scientist?
- What does a data scientist do?
- What are the technologies?
- Realities of Being a data scientist
Data Science at Engine Yard | Engine Yard Blog.
Michael Koploy wrote 3 Secrets for Aspiring Data Scientists about what it takes to enter a career as a data scientist. He lays out 3 steps:
- Sharpen Your Scientific Saw – Hone your math and science skills
- Learn the Language of Business – Data Scientists need to explain the data in business terms
- Keep Adding to Your Technical Toolbelt – Learn all the tools you can (NoSQL, Excel, Hadoop,…)
The article is a nice read. http://blog.softwareadvice.com/articles/bi/3-career-secrets-for-data-scientists-1101712/
Hilary Mason provides another great talk title: Machine Learning for Hackers. The video is worth watching. Enjoy!
DJ Patil and Josh Elman, both of Greylock Partners, give an insightful talk at LeWeb London 2012. The most important part was the introduction of the Data Scientific Method.
Data Scientific Method
- Start with a Question
- Leverage your current data
- Create features and run tests
- Analyze the results and draw insights
- Let the data frame a conversation