Programmer’s Guide to Data Mining – A free ebook

Ron Zacharski is currently writing a data mining book, A Programmer’s Guide to Data Mining. The book is targeted at programmers that want to know when and how to apply recommendation engines and other data mining techniques. The book is still in the writing phase, but I can say the first couple chapters are excellent. The book will always be available for free download.

If you are a programmer that is looking to add some recommendations to a website, I would highly suggest taking a look at this book.

First Steps to Data Analysis in R

This post is notes from the Coursera Data Analysis Course.

Here are some basic R commands that should useful for obtaining data and looking at data in R. Ideally these commands are useful for steps 4, 5, and 6 of the 11 Steps to Data Analysis.

Load the data and just look at it


download.file('http://location.com', 'localfile.csv')
data <- read.csv('localfile.csv')
dim(data)
names(data)
quantile(data$column)
hist(data$column)
head(data)
summary(data)
str(data)
unique(data$column)
length(unique(data$column))
table(data$column) - count of how many times each value appears in the column
table(data$column1, data$column2)

any(data$column < 100)
all(data$column > 100)

colsums(data)
colmeans(data, na.rm=T)
rowMeans(data, na.rm=T)

Look for missing values


is.na(data$column)
sum(is.na(data$column))
table(data$column, useNA="ifAny")

For more information on any R command, just type ? in the R console. For example, if you want to know more about the dim command, just type ?dim