The Data Stack: A Structured Approach

This is great post about the data stack at Factual.

The Data Stack: A Structured Approach

This is great post about the data stack at Factual.

With the large increases in college tuition and the ever increasing amount of information available on the internet. It is no wonder many people are trying to learn new skills on their own. Data Science is one of those disciplines that many people are turning to the internet to acquire the necessary skills. The problem is knowing exactly where to find the best material.

If you have the necessary background in math, statistics, and computer science; then it is a good time to learn some data science specific skills. Coursera just recently launched a course specifically devoted to Data Science. It is titled: Introduction to Data Science. The course is being taught by Bill Howe of the University of Washington’s eScience Institute. I believe this course is an excellent place to start. I am very excited about this course.

Here is a listing of other materials that could be helpful to learning data science.

- http://datascienc.es – A course at University of California Berkeley
- Any Machine Learning or Data Analysis course from Coursera
- Free Data Science eBook

Many aspects of computer science are fundamental to data science. A good data scientist has to be able to transform/extract/manipulate lots of data. Computer programming is the main technique for such operations. Here are numerous resources to help you learn the fundamentals of computer science.

If you are not familiar with computer programming, this list is a good place to start.

- Udacity Introduction to Computer Science
- Coursera Computer Science 101
- Coursera Learn to Program: The Fundamentals
- Codecademy – A wonderful interactive site for learning to program

- Udacity Algorithms
- Udacity Design of Computer Programs
- Coursera Learn to Program: Crafting Quality Code
- Coursera Algorithms Part 1 and Part 2
- Coursera Design and Analysis of Algorithms Part 1 and Part 2

Stack Overflow is a great site for answering all of your programming questions. It is good for beginners as well as more advanced programmers. Also, if you start writing a lot of code, Github is a great place to store that code.

Statistics is an important component of data science. Thus, it would be nice to have some resources available.

Well, here is a list of free statistics resources available online. All of these are fairly introductory, but I am guessing more advanced topics will be coming from these same organizations.

- Khan Academy Statistics
- Coursera Statistics One by Princeton – I am guessing a Statistics Two will be offered as well
- Udacity – Intro to Statistics

In addition to the free resources online, there are other options as well.

- Statistics.com – courses are about $400-$500 but programs lead to certificates
- Most all local colleges will offer courses in statistics

What other resources are available for learning statistics?

Math is one of the key building blocks of data science. While you cannot do a lot of data science with just calculus and linear algebra, both topics are essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics. Here are some freely available resources for learning both topics.

- Khan Academy Calculus
- Coursera – Calculus (Single Variable) – multi-variable calculus is sure to be coming soon

The following 2 courses from Coursera maybe good for a person learning to think mathematically.

O’Reilly and Data Scientist DJ Patil just released a new free report titled, Data Jujitsu: The Art of Turning Data Into Product. If you are interested in building data products, the report is excellent and definitely worth your time.

The report provides a definition for a data product. It then covers a process for taking an idea from concept to reality. The main point is to use some *shortcuts* and get the product out fast. Then if people like the product, and only then, spend some time really enhancing the algorithms.

%d bloggers like this: