EdX will be offering Foundations of Data Analysis via the University of Texas at Austin. The course starts November 4, 2014. Here is a list of topics:

- Tutorials on using R
- Descriptive Statistics
- Statistical Models (Regression)
- Inferential Stats

EdX will be offering Foundations of Data Analysis via the University of Texas at Austin. The course starts November 4, 2014. Here is a list of topics:

- Tutorials on using R
- Descriptive Statistics
- Statistical Models (Regression)
- Inferential Stats

SlideRule is a new startup focused on being on online learning hub. One of the sections of the site allows experts to create “learning paths” for a topic. Well, Claudia Gold, data scientist at Airbnb, created a learning path for data science titled **Data Analysis Learning Path**. The learning path covers: topics, timelines, resources, and links necessary to acquire the skills needed to be a data scientist.

Happy Learning.

As of yesterday, I was completely new to the term ** work-force science**. Essentially, work-force science is data analysis applied to Human Resources. It makes more sense than the old gut feeling approach. If you want to know more, see this excellent article from the New York Times, Big Data, Trying to Build Better Workers. The following quote sums up one of the key findings.

An applicant’s work history is not a good predictor of future results.

The list is ordered according to the level of difficulty.

**Descriptive**just describe the data, common for census type of data**Exploratory**find relationships that were not clear beforehand, useful for defining future studies, remember*correlation does not imply causation***Inferential**use a small dataset to say something about a larger population, most common goal of statistical analysis**Predictive**use data from some object to predict something(values) for another object, important to measure the right values and to use as much data as possible**Causal**what happens to one variable when you force another variable to change, usually requires a randomized study, this is the*gold standard*of data analysis**Mechanistic**understanding the exact changes in variables that lead to changes in other variables for individual objects, typically from engineering and physical sciences, data analysis can be used to infer the parameters if the equations are known

This list comes from information presented in the first week of the Coursera Data Analysis class.

Cosma Shalizi of the Statistics Department at Carnegie Mellon University is working on an Advanced Data Analysis from an Elementary Point of View textbook. A copy of the textbook will remain freely available on the website. Since the textbook is still being created, comments are welcome.

Data analysis is performed in many different fields and on many different types of data. Most fields call it something different. The following list comes straight from Jeff Leek’s Data Analysis Coursera class.

**Biostatistics**for medical data**Data Science**for data from web analytics**Machine learning**for data in computer science/computer vision**Natural language processing**for data from texts**Signal processing**for data from electrical signals**Business analytics**for data on customers**Econometrics**for economic data

The type of analysis is very similar for all fields, but what separates data science and machine learning from the others is the 3 V’s of big data. Data science and machine learning deal with a greater Volume of data, Variety of data, and Velocity (speed at which new data appears) of data. Because it is becoming cheaper and easier to store massive amounts of data than ever before, I think the other fields are beginning to realize the potential in big data. Signal processing is definitely becoming an area with big data, due to the fact that electrical sensors are everywhere.

What are your thoughts? Do you see any real differences in the data analysis performed for the data types above?

The Coursera Data Analysis course started yesterday. This course would be an excellent follow-up to the Computing with Data Analysis course. For a bit more about the course, check out this video explaining the content. The course consists of lectures, quizzes, and some data analysis assignments. There is still plenty of time to signup and start analyzing.

Jeff Leak, instructor of the upcoming Coursera Data Analysis course, wrote up a nice blog post, The Landscape of Data Analysis, explaining the topics to be covered in the course. The topics look good. He also made a video explaining how data science fits in with other disciplines such as: computer science, medicine, statistics, and so on. The video is short (less than 5 minutes), so it is definitely worth the time.

Week 1 of the Computing for Data Analysis course focused mostly on getting R and RStudio installed. Then it focused on some of the basics of the R language. Here are some of the topics

- History of R
- How to get help
`help()`

- Data types in R
- numeric (real numbers)
- character (strings)
- integer (counting numbers)
- complex (imaginary)
- logical (TRUE/FALSE)

- Groupings of data
- vector (all the same data type)

v <- c(1.4, 2.5, 1.7)

v <- 1:10 - list (NOT all same data type)

lst <- list("a", 3.5, TRUE, "word", 4+5i) - matrix (2-dimensional vector)

m <- matrix(1:20, nrow=4, ncol=5)

- vector (all the same data type)
- Factor is for categorical data

f <- factor(c("big","small","big","big"))

table(f) - Missing Values
- NaN
`is.nan()`

(Not a Number) - NA
`is.na()`

(Not Available)

- NaN
- Reading/Writing data

d <- read.table("file.txt")

d <- read.csv("file.csv")

write.table("outFile.txt")

- Better Reading data

initial <- read.csv("data.csv", nrow=10)

classes <- sapply(initial, class)

fullData <- read.csv("data.csv", nrow=2000, colClasses=classes) - The
`str()`

function for displaying information about the structure of an object

If you hurry, there still might be time to enroll in the class and finish the homework for full credit. Week 1 was not too intensive.

The Coursera class Computational Methods of Data Analysis started yesterday. There is still plenty of time to enroll in the class.

This course assumes a good familiarity with calculus, linear algebra, and some basic programming. Thus, if your math background is weak or needs a refresher, you may not want to take this course. However, if you have a solid math background, the course starts right into Fourier Analysis. The course topics look good, and image analysis is one of the central themes of the course. The software Matlab ($99 for student edition) is recommended, however Octave (Free) is acceptable.