Videos for Learning R

All of the videos from the Computing from Data Analysis Coursera course are available on Youtube. If you are interested in learning R or just need a refresher on some of the topics, these videos could serve as a great resource.

  • Week 1 installing R, data types, reading/writing files
  • Week 2 functions, apply, sapply, other *apply functions
  • Week 3 plotting, simulation, graphics, lattice
  • Week 4 plotting, regular expressions, and date/time

Free Data Analysis Textbook

Cosma Shalizi of the Statistics Department at Carnegie Mellon University is working on an Advanced Data Analysis from an Elementary Point of View textbook. A copy of the textbook will remain freely available on the website. Since the textbook is still being created, comments are welcome.

Data Analysis by Data Type

Data analysis is performed in many different fields and on many different types of data. Most fields call it something different. The following list comes straight from Jeff Leek’s Data Analysis Coursera class.

Name of Data Analysis by Data Type

The type of analysis is very similar for all fields, but what separates data science and machine learning from the others is the 3 V’s of big data. Data science and machine learning deal with a greater Volume of data, Variety of data, and Velocity (speed at which new data appears) of data. Because it is becoming cheaper and easier to store massive amounts of data than ever before, I think the other fields are beginning to realize the potential in big data. Signal processing is definitely becoming an area with big data, due to the fact that electrical sensors are everywhere.

What are your thoughts? Do you see any real differences in the data analysis performed for the data types above?

Syracuse Free Online Data Science Course

The Syracuse University iSchool will be hosting a free, open online Introduction to Data Science course. The course will be focused around Professor Jeff Stanton’s data science ebook. If you are interested, please hurry because the enrolment is limited to the first 500.

Hans Rosling: The Joy of Stats

Hans Rosling does an excellent job of showing how “not boring” statistics can be. This is a great informative statistics video. It was originally posted at The Joy of Stats.

2013 Year of Statistics

The International Year of Statistics (Statistics2013)

Yes, 2013 is the International Year of Statistics. Thus a video was made.

Data Analysis at Coursera

The Coursera Data Analysis course started yesterday. This course would be an excellent follow-up to the Computing with Data Analysis course. For a bit more about the course, check out this video explaining the content. The course consists of lectures, quizzes, and some data analysis assignments. There is still plenty of time to signup and start analyzing.

Data Science: The Paper that Started it All

Although Tobias Mayer may be known as the first data scientist, he did not coin the term data science. According to Wikipedia, the first use of the term data science was in 2001.

Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics was published in the April 2001 edition of the International Statistics Review. The author was William S. Cleveland, currently a Professor of Statistics at Purdue University.

The paper proposes a new field of study named data science. It then goes on to list and explain 6 technical focus areas for a university data science department.

  1. Multidisciplinary Investigations
  2. Models and Methods for Data
  3. Computing with Data
  4. Pedagogy
  5. Tool Evaluation
  6. Theory

For the most part, the paper is still relevant. I did find a couple of good quotes from the paper that deserve comment.

The primary agents for change should be university departments themselves.

That did not happen. The driving agents for change in the data science field have been some of the newer technology/web companies such as LinkedIn, Twitter, and Facebook (none of which even existed in 2001).

…knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of the knowledge bases would produce a powerful force for innovation.

I think this statement still applies today. The world is just starting to realize the benefits of merging knowledge from computer science and statistics. There is much more work to do. Fortunately, businesses and universities are working to address the merger.

Have you seen the paper before? What are your thoughts on it?

Data Science is on Wikipedia

Wikipedia has a page for Data Science.

Pizza Delivery: A video Infographic

This is a video infographic about pizza delivery in Manhattan. This is another good way to make data tell a story.