New York University is offering a Large Scale Machine Learning course starting later this month. This is NOT a MOOC, so it is not open to everyone. However, the lecture videos will be posted and possibly the other class handouts. This is not an introductory course, so knowledge of machine learning is a prerequisite. The course is being taught by John Langford of Microsoft Research and Yann LeCun of NYU.

Jeff Leak, instructor of the upcoming Coursera Data Analysis course, wrote up a nice blog post, The Landscape of Data Analysis, explaining the topics to be covered in the course. The topics look good. He also made a video explaining how data science fits in with other disciplines such as: computer science, medicine, statistics, and so on. The video is short (less than 5 minutes), so it is definitely worth the time.

Not only is the topic interesting, but the concept of breaking the global population down into 100 people is brilliant. This infographic is easily understandable, and it conveys a whole lot of information in a clean and concise manner. For more about where the data came from, see the 100 People page.

Week 1 of the Computing for Data Analysis course focused mostly on getting R and RStudio installed. Then it focused on some of the basics of the R language. Here are some of the topics

History of R

How to get help help()

Data types in R

numeric (real numbers)

character (strings)

integer (counting numbers)

complex (imaginary)

logical (TRUE/FALSE)

Groupings of data

vector (all the same data type)
v <- c(1.4, 2.5, 1.7)
v <- 1:10

list (NOT all same data type)
lst <- list("a", 3.5, TRUE, "word", 4+5i)

matrix (2-dimensional vector)
m <- matrix(1:20, nrow=4, ncol=5)

Factor is for categorical data
f <- factor(c("big","small","big","big"))
table(f)

Missing Values

NaN is.nan() (Not a Number)

NA is.na() (Not Available)

Reading/Writing data
d <- read.table("file.txt")
d <- read.csv("file.csv")
write.table("outFile.txt")

This course assumes a good familiarity with calculus, linear algebra, and some basic programming. Thus, if your math background is weak or needs a refresher, you may not want to take this course. However, if you have a solid math background, the course starts right into Fourier Analysis. The course topics look good, and image analysis is one of the central themes of the course. The software Matlab ($99 for student edition) is recommended, however Octave (Free) is acceptable.

The Elements of Statistical Learning textbook is available for free. It is a classic, widely-used textbooks for statistics and machine learning. Here is a far from complete list of some of the topics:

Supervised Learning

Linear/Logistic Regression

Regularization

Model Selection

Trees

Neural Networks

Support Vector Machines

Random Forests

Unsupervised Learning

Clustering

As you can see, the book is quite extensive.

Note: This book has been available for a quite a while, but I realized I have not added a link to it on my blog.