Beyond the title, no more explanation is needed.
Here is a list of Steps to Data Analysis from the Data Analysis Coursera course.
- Define Question
- Define Ideal Dataset
- Define what data you can access
- Obtain the data
- Clean the data
- Exploratory Data analysis
- Statistical prediction
- Interpret results
- Challenge Results
- Writeup results
- Create reproducible code for others to recreate
Update: A couple of comments have been made indicating the following 2 steps be added.
- Missing Value Analysis
- Outlier management
What do you think? Is anything missing?
This is a wonderful talk by Max DeMarzi (he has a very informative blog as well). If you are new to NoSQL or Graph Databases, I highly recommend this video.
One comment stuck out for me:
You’re never gonna run out of nodes when you get to half a trillion…
That is a really big number, but I wonder how many years that statement will stand. If you have any thoughts, please leave a comment.
Even the NFL is getting into data analysis these days.
Personal note: Like many American children, I grew up dreaming of playing professional football in the NFL. Also, like many American children, that dream did not come true. Maybe now I could try to make the NFL as a data scientist. I wonder if they have fall training camp for the analytics department. If so, sign me up.
The list is ordered according to the level of difficulty.
- Descriptive just describe the data, common for census type of data
- Exploratory find relationships that were not clear beforehand, useful for defining future studies, remember correlation does not imply causation
- Inferential use a small dataset to say something about a larger population, most common goal of statistical analysis
- Predictive use data from some object to predict something(values) for another object, important to measure the right values and to use as much data as possible
- Causal what happens to one variable when you force another variable to change, usually requires a randomized study, this is the gold standard of data analysis
- Mechanistic understanding the exact changes in variables that lead to changes in other variables for individual objects, typically from engineering and physical sciences, data analysis can be used to infer the parameters if the equations are known
This list comes from information presented in the first week of the Coursera Data Analysis class.