A Data Science Curriculum

This is not intended to be mapped to a set of college courses. It is intended to be a listing of necessary skills for a data scientist. For a definition of data scientist, see this previous post.

Mathematics

  • Calculus – not directly important to data science, but the knowledge is important to understand the statistics and machine learning
  • Matrix Operations

Statistics

  • Regression – Linear and Logistic
  • Bayesian Statistics

Tools

  • Hadoop
  • R – stats
  • Octave – machine learning

Computing

  • Basic Programming – Java, C/C++, and Python seem to be good language choices
  • Machine Learning
  • Database Knowledge – not limited to just relational databases

Communication

  • Data Visualization – how to make data look good: maps, graphs, etc
  • Presentation – story telling, be comfortable explaining data to others
  • Writing

Do you have anything to add/remove from the list?


Originally Posted

in

, ,

by

Last Modified:

Comments

6 responses to “A Data Science Curriculum”

  1. Craig Avatar
    Craig

    Also critical:
    Business process design–How to design the info and data flows into and out of data systems.

    Metadata and derived data design, computation, storage and retrieval.

    1. Ryan Swanstrom Avatar

      Good points. Being able to use and process the data is very important. Especially when real-time analysis is involved.

  2. David Diez Avatar
    David Diez

    I enjoyed browsing the list! I’m putting Hadoop on my items of things to learn, and I’ve just started toying now with Octave.

    Similar to what Craig said, data collection (experimental design / sampling) is a helpful topic in a curriculum and could be added.

    1. Ryan Swanstrom Avatar

      David,
      Hadoop is on my short list of things to learn as well. I hope to try some stuff out and post a bit to the blog. You will probably find Octave quite easy to pickup since you are very familiar with R. Octave is more like Matlab though.

  3. Ivan Brusic Avatar
    Ivan Brusic

    Hadoop encompasses multiple subprojects nowadays. While understand the general concepts of Hadoop is important (MapReduce, clusters), working with a higher-level project such as Pig and Hive make the transition into Hadoop much easier.

    1. Ryan Swanstrom Avatar

      That is true. Thanks for sharing.

Leave a Reply

Discover more from Ryan Swanstrom

Subscribe now to keep reading and get access to the full archive.

Continue reading