Here is a list of **Steps to Data Analysis** from the Data Analysis Coursera course.

- Define Question
- Define Ideal Dataset
- Define what data you can access
- Obtain the data
- Clean the data
- Exploratory Data analysis
- Statistical prediction
- Interpret results
- Challenge Results
- Writeup results
- Create reproducible code for others to recreate

Update: A couple of comments have been made indicating the following 2 steps be added.

- Missing Value Analysis
- Outlier management

What do you think? Is anything missing?

What is probably missing:

1) Missing Value Analysis

2) Outlier management

Those would be good additions.

Thanks for the comment.

Ryan

Does “Challenge the results” include model validation? i.e. are the assumptions of the model met?

I would say model validation falls under statistical prediction, but I could also see it being under challenge the results as well. Either way, it is important and needs to occur somewhere.

Thanks for commenting.

Ryan

Ryan,

How would one correlate the earlier post “Levels of Data Analysis” with this particular post? I guess my question is : Are the steps mentioned above valid for one or more levels? Just trying to get the bigger complete picture

Harsha,

That is a great question. Maybe I will put up a blog post with my thoughts of how the 2 lists fit together.

Thanks,

Ryan

Realized I haven’t responded to this yet. I don’t think it is worth its own post, so I will just leave my thoughts here. I would say the “Levels of Data Analysis” map into steps 6,7,8, and possibly step 9 above. How does that sound?