Andrew Ng’s wonderful Coursera course on machine learning starts today. It is not too late to sign up.

# Tag Archives: coursera

# Levels of Data Analysis

The list is ordered according to the level of difficulty.

**Descriptive**just describe the data, common for census type of data**Exploratory**find relationships that were not clear beforehand, useful for defining future studies, remember*correlation does not imply causation***Inferential**use a small dataset to say something about a larger population, most common goal of statistical analysis**Predictive**use data from some object to predict something(values) for another object, important to measure the right values and to use as much data as possible**Causal**what happens to one variable when you force another variable to change, usually requires a randomized study, this is the*gold standard*of data analysis**Mechanistic**understanding the exact changes in variables that lead to changes in other variables for individual objects, typically from engineering and physical sciences, data analysis can be used to infer the parameters if the equations are known

This list comes from information presented in the first week of the Coursera Data Analysis class.

# Data Analysis at Coursera

The Coursera Data Analysis course started yesterday. This course would be an excellent follow-up to the Computing with Data Analysis course. For a bit more about the course, check out this video explaining the content. The course consists of lectures, quizzes, and some data analysis assignments. There is still plenty of time to signup and start analyzing.

# Data Analysis Landscape

Jeff Leak, instructor of the upcoming Coursera Data Analysis course, wrote up a nice blog post, The Landscape of Data Analysis, explaining the topics to be covered in the course. The topics look good. He also made a video explaining how data science fits in with other disciplines such as: computer science, medicine, statistics, and so on. The video is short (less than 5 minutes), so it is definitely worth the time.

# Computing For Data Analysis Week 1 Overview

Week 1 of the Computing for Data Analysis course focused mostly on getting R and RStudio installed. Then it focused on some of the basics of the R language. Here are some of the topics

- History of R
- How to get help
`help()`

- Data types in R
- numeric (real numbers)
- character (strings)
- integer (counting numbers)
- complex (imaginary)
- logical (TRUE/FALSE)

- Groupings of data
- vector (all the same data type)

v <- c(1.4, 2.5, 1.7)

v <- 1:10 - list (NOT all same data type)

lst <- list("a", 3.5, TRUE, "word", 4+5i) - matrix (2-dimensional vector)

m <- matrix(1:20, nrow=4, ncol=5)

- vector (all the same data type)
- Factor is for categorical data

f <- factor(c("big","small","big","big"))

table(f) - Missing Values
- NaN
`is.nan()`

(Not a Number) - NA
`is.na()`

(Not Available)

- NaN
- Reading/Writing data

d <- read.table("file.txt")

d <- read.csv("file.csv")

write.table("outFile.txt")

- Better Reading data

initial <- read.csv("data.csv", nrow=10)

classes <- sapply(initial, class)

fullData <- read.csv("data.csv", nrow=2000, colClasses=classes) - The
`str()`

function for displaying information about the structure of an object

If you hurry, there still might be time to enroll in the class and finish the homework for full credit. Week 1 was not too intensive.

# Coursera Computational Methods of Data Analysis Started

The Coursera class Computational Methods of Data Analysis started yesterday. There is still plenty of time to enroll in the class.

This course assumes a good familiarity with calculus, linear algebra, and some basic programming. Thus, if your math background is weak or needs a refresher, you may not want to take this course. However, if you have a solid math background, the course starts right into Fourier Analysis. The course topics look good, and image analysis is one of the central themes of the course. The software Matlab ($99 for student edition) is recommended, however Octave (Free) is acceptable.

# Computing for Data Analysis Starts Today

Coursera’s Computing for Data Analysis starts today. Enroll now and start learning R and data analysis.

# Top 5 Data Startups

**Kaggle**They make data science a sport, enough said.**DataKind**DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff. They match nonprofit organizations with people that love to analyze data and create visualizations.**Cloudera**They call themselves*“The Platform for Big Data”*. They are working hard to make hadoop easier to use.**Coursera**Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.**BigML**They are trying to make machine learning available to everyone. Machine Learning as a Service!

# In 2013, Learn Data Science via Coursera (a curriculum)

Coursera has some excellent courses coming up in 2013. Here are some potential curriculum paths for someone looking to learn data science.

### Prerequisites

Either sequence requires/recommends some basic programming experience. If you are unfamiliar with programming, you still have a couple weeks to get familiar with some basic programming concepts. Some good places to start would be either Coursera’s Computer Science 101 or Codecademy’s Python tutorial.

### Data Science Curriculum #1

If you are new to programming, this would be the recommend sequence. The first course focuses on programming.

Course | Start Date | Completion Date |
---|---|---|

Computing for Data Analysis | Jan. 2, 2013 | Jan. 25, 2013 |

Data Analysis | Jan. 22, 2013 | Mar. 15, 2013 |

Introduction to Data Science | April 2013 | June 2013 |

### Data Science Curriculum #2

Course | Start Date | Completion Date |
---|---|---|

Computational Methods for Data Analysis | Jan. 7, 2013 | Mar. 15, 2013 |

Introduction to Data Science | April 2013 | June 2013 |

### Additional Courses

Neither of the Coursera machine learning (Stanford or U of Washington) courses are scheduled for 2013, but either of them would be a great (maybe necessary) follow up course. Hopefully, one of those courses will be starting in July or shortly there after.

After completing one of the above sequences combined with a machine learning course, a person should be skilled enough to begin doing useful data science work. (Note: A new job as a data scientist is not guaranteed, but the courses won’t hurt your chances.) Plus, Coursera offers numerous other classes that could be taken at a later time to increase depth in certain areas of data science (Natural Language Processing, Image Processing, and more).

Happy Learning in 2013!

If you are interested in more ways to learn data science, please check out Data Science 201, coming in 2013.

# Coursera Announces College Credit

Yesterday, Coursera announced that students will soon be able to earn college credits for some of the courses. See the blog post with the college credit announcement.