A Data Science Curriculum

This is not intended to be mapped to a set of college courses. It is intended to be a listing of necessary skills for a data scientist. For a definition of data scientist, see this previous post.

Mathematics

  • Calculus – not directly important to data science, but the knowledge is important to understand the statistics and machine learning
  • Matrix Operations

Statistics

  • Regression – Linear and Logistic
  • Bayesian Statistics

Tools

  • Hadoop
  • R – stats
  • Octave – machine learning

Computing

  • Basic Programming – Java, C/C++, and Python seem to be good language choices
  • Machine Learning
  • Database Knowledge – not limited to just relational databases

Communication

  • Data Visualization – how to make data look good: maps, graphs, etc
  • Presentation – story telling, be comfortable explaining data to others
  • Writing

Do you have anything to add/remove from the list?

6 thoughts on “A Data Science Curriculum”

  1. Also critical:
    Business process design–How to design the info and data flows into and out of data systems.

    Metadata and derived data design, computation, storage and retrieval.

  2. I enjoyed browsing the list! I’m putting Hadoop on my items of things to learn, and I’ve just started toying now with Octave.

    Similar to what Craig said, data collection (experimental design / sampling) is a helpful topic in a curriculum and could be added.

    1. David,
      Hadoop is on my short list of things to learn as well. I hope to try some stuff out and post a bit to the blog. You will probably find Octave quite easy to pickup since you are very familiar with R. Octave is more like Matlab though.

  3. Hadoop encompasses multiple subprojects nowadays. While understand the general concepts of Hadoop is important (MapReduce, clusters), working with a higher-level project such as Pig and Hive make the transition into Hadoop much easier.

Leave a Reply

Your email address will not be published. Required fields are marked *