A while back James Kobielus wrote the article, Data Scientist: Consider the Curriculum. It contains one of the best descriptions of a data science curriculum I have seen. Also the article includes a list of algorithms/modeling techniques that should be known by a data scientist. Below is the list from the article.

linear algebra

basic statistics

linear and logistic regression

data mining

predictive modeling

cluster analysis

association rules

market basket analysis

decision trees

time-series analysis

forecasting

machine learning

Bayesian and Monte Carlo Statistics

matrix operations

sampling

text analytics

summarization

classification

primary components analysis

experimental design

unsupervised learning

constrained optimization

The list almost looks overwhelming.
Do you think anything is missing from the list?

I’m not entirely familiar with everything written in this post, but I have a feeling that some of the topics are parts of other topics in the list.

e.g. Shouldn’t “Unsupervised Learning” be part of “Machine Learning” ? Maybe the same thing for “Linear and Logistic Learning” which is a part of Statistics/Machine Learning.

So, I’m wondering, if we group and cluster the topics you’ve listed, what major topics would be in that list that would comprehend all others.

Ahmad,
That is good point, but I also think it is also beneficial to know the exact topics and not just the broad categories. Anyhow, the list is a good goal to shoot for.

Reblogged this on A Data Head's Diary and commented:
This is great checklist of topics to become proficient in for all aspiring Data Scientists. This list of skills can reasonably be attained within a master’s degree program in Statistics, Computer Science or Operations Research.

What strikes me is this curriculum only consists of courses on actual data analysis techniques/methods. Does this imply that a data analyst should not have some domain knowledge? (which is something different from being a domain expert!)

I am not sure if you are talking about my post or the original article. My post does list just the data analysis methods. The original article mentions an “Applications and outcomes” section that briefly covers domain knowledge. However, a lot of domain knowledge is difficult to cover in a classroom curriculum. I do think that domain knowledge is important to a good data scientist.

Mechanistic Analysis

Thanks for the comment. I am not even familiar with the topic. I will have to google it and read up about it.

Thanks,

Ryan

Nice post !

I’m not entirely familiar with everything written in this post, but I have a feeling that some of the topics are parts of other topics in the list.

e.g. Shouldn’t “Unsupervised Learning” be part of “Machine Learning” ? Maybe the same thing for “Linear and Logistic Learning” which is a part of Statistics/Machine Learning.

So, I’m wondering, if we group and cluster the topics you’ve listed, what major topics would be in that list that would comprehend all others.

Ahmad,

That is good point, but I also think it is also beneficial to know the exact topics and not just the broad categories. Anyhow, the list is a good goal to shoot for.

Thanks for commenting.

Ryan

Reblogged this on A Data Head's Diary and commented:

This is great checklist of topics to become proficient in for all aspiring Data Scientists. This list of skills can reasonably be attained within a master’s degree program in Statistics, Computer Science or Operations Research.

What strikes me is this curriculum only consists of courses on actual data analysis techniques/methods. Does this imply that a data analyst should not have some domain knowledge? (which is something different from being a domain expert!)

I am not sure if you are talking about my post or the original article. My post does list just the data analysis methods. The original article mentions an “Applications and outcomes” section that briefly covers domain knowledge. However, a lot of domain knowledge is difficult to cover in a classroom curriculum. I do think that domain knowledge is important to a good data scientist.