Free Natural Language Processing Book

Natural Language Processing for the working Programmer

Beyond the title, no more explanation is needed.

11 Steps to Data Analysis

Here is a list of Steps to Data Analysis from the Data Analysis Coursera course.

  1. Define Question
  2. Define Ideal Dataset
  3. Define what data you can access
  4. Obtain the data
  5. Clean the data
  6. Exploratory Data analysis
  7. Statistical prediction
  8. Interpret results
  9. Challenge Results
  10. Writeup results
  11. Create reproducible code for others to recreate

Update: A couple of comments have been made indicating the following 2 steps be added.

  1. Missing Value Analysis
  2. Outlier management

What do you think? Is anything missing?

Nice GraphDB and NoSQL Talk

This is a wonderful talk by Max DeMarzi (he has a very informative blog as well). If you are new to NoSQL or Graph Databases, I highly recommend this video.

One comment stuck out for me:

You’re never gonna run out of nodes when you get to half a trillion…

That is a really big number, but I wonder how many years that statement will stand. If you have any thoughts, please leave a comment.

ChiSC: Max DeMarzi – Is Your Problem a Graph Problem? from 8th Light on Vimeo.

Buffalo Bills to start advanced analytics department

Even the NFL is getting into data analysis these days.

Buffalo Bills to start advanced analytics department

Personal note: Like many American children, I grew up dreaming of playing professional football in the NFL. Also, like many American children, that dream did not come true. Maybe now I could try to make the NFL as a data scientist. I wonder if they have fall training camp for the analytics department. If so, sign me up.

Levels of Data Analysis

The list is ordered according to the level of difficulty.

  • Descriptive just describe the data, common for census type of data
  • Exploratory find relationships that were not clear beforehand, useful for defining future studies, remember correlation does not imply causation
  • Inferential use a small dataset to say something about a larger population, most common goal of statistical analysis
  • Predictive use data from some object to predict something(values) for another object, important to measure the right values and to use as much data as possible
  • Causal what happens to one variable when you force another variable to change, usually requires a randomized study, this is the gold standard of data analysis
  • Mechanistic understanding the exact changes in variables that lead to changes in other variables for individual objects, typically from engineering and physical sciences, data analysis can be used to infer the parameters if the equations are known

This list comes from information presented in the first week of the Coursera Data Analysis class.