Tag Archives: coursera

Levels of Data Analysis

The list is ordered according to the level of difficulty.

  • Descriptive just describe the data, common for census type of data
  • Exploratory find relationships that were not clear beforehand, useful for defining future studies, remember correlation does not imply causation
  • Inferential use a small dataset to say something about a larger population, most common goal of statistical analysis
  • Predictive use data from some object to predict something(values) for another object, important to measure the right values and to use as much data as possible
  • Causal what happens to one variable when you force another variable to change, usually requires a randomized study, this is the gold standard of data analysis
  • Mechanistic understanding the exact changes in variables that lead to changes in other variables for individual objects, typically from engineering and physical sciences, data analysis can be used to infer the parameters if the equations are known

This list comes from information presented in the first week of the Coursera Data Analysis class.

Data Analysis at Coursera

The Coursera Data Analysis course started yesterday. This course would be an excellent follow-up to the Computing with Data Analysis course. For a bit more about the course, check out this video explaining the content. The course consists of lectures, quizzes, and some data analysis assignments. There is still plenty of time to signup and start analyzing.

Data Analysis Landscape

Jeff Leak, instructor of the upcoming Coursera Data Analysis course, wrote up a nice blog post, The Landscape of Data Analysis, explaining the topics to be covered in the course. The topics look good. He also made a video explaining how data science fits in with other disciplines such as: computer science, medicine, statistics, and so on. The video is short (less than 5 minutes), so it is definitely worth the time.

Computing For Data Analysis Week 1 Overview

Week 1 of the Computing for Data Analysis course focused mostly on getting R and RStudio installed. Then it focused on some of the basics of the R language. Here are some of the topics

  • History of R
  • How to get help help()
  • Data types in R
    • numeric (real numbers)
    • character (strings)
    • integer (counting numbers)
    • complex (imaginary)
    • logical (TRUE/FALSE)
  • Groupings of data
    • vector (all the same data type)
      v <- c(1.4, 2.5, 1.7)
      v <- 1:10
    • list (NOT all same data type)
      lst <- list("a", 3.5, TRUE, "word", 4+5i)
    • matrix (2-dimensional vector)
      m <- matrix(1:20, nrow=4, ncol=5)
  • Factor is for categorical data
    f <- factor(c("big","small","big","big"))
    table(f)
  • Missing Values
    • NaN is.nan() (Not a Number)
    • NA is.na() (Not Available)
  • Reading/Writing data
    d <- read.table("file.txt")
    d <- read.csv("file.csv")
    write.table("outFile.txt")
  • Better Reading data
    initial <- read.csv("data.csv", nrow=10)
    classes <- sapply(initial, class)
    fullData <- read.csv("data.csv", nrow=2000, colClasses=classes)
  • The str() function for displaying information about the structure of an object

If you hurry, there still might be time to enroll in the class and finish the homework for full credit. Week 1 was not too intensive.

Coursera Computational Methods of Data Analysis Started

The Coursera class Computational Methods of Data Analysis started yesterday. There is still plenty of time to enroll in the class.

This course assumes a good familiarity with calculus, linear algebra, and some basic programming. Thus, if your math background is weak or needs a refresher, you may not want to take this course. However, if you have a solid math background, the course starts right into Fourier Analysis. The course topics look good, and image analysis is one of the central themes of the course. The software Matlab ($99 for student edition) is recommended, however Octave (Free) is acceptable.

Computing for Data Analysis Starts Today

Coursera’s Computing for Data Analysis starts today. Enroll now and start learning R and data analysis.

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

In 2013, Learn Data Science via Coursera (a curriculum)

Coursera has some excellent courses coming up in 2013. Here are some potential curriculum paths for someone looking to learn data science.

Prerequisites

Either sequence requires/recommends some basic programming experience. If you are unfamiliar with programming, you still have a couple weeks to get familiar with some basic programming concepts. Some good places to start would be either Coursera’s Computer Science 101 or Codecademy’s Python tutorial.

Data Science Curriculum #1

If you are new to programming, this would be the recommend sequence. The first course focuses on programming.

Course Start Date Completion Date
Computing for Data Analysis Jan. 2, 2013 Jan. 25, 2013
Data Analysis Jan. 22, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Data Science Curriculum #2

Course Start Date Completion Date
Computational Methods for Data Analysis Jan. 7, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Additional Courses

Neither of the Coursera machine learning (Stanford or U of Washington) courses are scheduled for 2013, but either of them would be a great (maybe necessary) follow up course. Hopefully, one of those courses will be starting in July or shortly there after.

After completing one of the above sequences combined with a machine learning course, a person should be skilled enough to begin doing useful data science work. (Note: A new job as a data scientist is not guaranteed, but the courses won’t hurt your chances.) Plus, Coursera offers numerous other classes that could be taken at a later time to increase depth in certain areas of data science (Natural Language Processing, Image Processing, and more).

Happy Learning in 2013!

If you are interested in more ways to learn data science, please check out Data Science 201, coming in 2013.

Coursera Announces College Credit

Yesterday, Coursera announced that students will soon be able to earn college credits for some of the courses. See the blog post with the college credit announcement.

Coursera Adds 17 New Universities

Just Announced, Coursera adds 17 new universities. Those universities include Columbia and Brown, as well as a few international universities.

A few notable courses for data science are: a new machine learning course from the University of Washington, Linear Algebra from Brown, and Natural Language Processing by Michael Collins from Columbia.

See the following pages to seed what other courses are now available.