Hans Rosling does an excellent job of showing how “not boring” statistics can be. This is a great informative statistics video. It was originally posted at The Joy of Stats.

# Tag Archives: statistics

# 2013 Year of Statistics

Yes, 2013 is the International Year of Statistics. Thus a video was made.

# Data Science: The Paper that Started it All

Although Tobias Mayer may be known as the first data scientist, he did not coin the term *data science*. According to Wikipedia, the first use of the term *data science* was in 2001.

**Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics** was published in the April 2001 edition of the International Statistics Review. The author was William S. Cleveland, currently a Professor of Statistics at Purdue University.

The paper proposes a new field of study named *data science*. It then goes on to list and explain 6 technical focus areas for a university data science department.

- Multidisciplinary Investigations
- Models and Methods for Data
- Computing with Data
- Pedagogy
- Tool Evaluation
- Theory

For the most part, the paper is still relevant. I did find a couple of good quotes from the paper that deserve comment.

The primary agents for change should be university departments themselves.

That did not happen. The driving agents for change in the data science field have been some of the newer technology/web companies such as LinkedIn, Twitter, and Facebook (none of which even existed in 2001).

…knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of the knowledge bases would produce a powerful force for innovation.

I think this statement still applies today. The world is just starting to realize the benefits of merging knowledge from computer science and statistics. There is much more work to do. Fortunately, businesses and universities are working to address the merger.

Have you seen the paper before? What are your thoughts on it?

# Elements of Statistical Learning Textbook (Free)

The Elements of Statistical Learning textbook is available for free. It is a classic, widely-used textbooks for statistics and machine learning. Here is a far from complete list of some of the topics:

- Supervised Learning
- Linear/Logistic Regression
- Regularization
- Model Selection
- Trees
- Neural Networks
- Support Vector Machines
- Random Forests
- Unsupervised Learning
- Clustering

As you can see, the book is quite extensive.

Note: This book has been available for a quite a while, but I realized I have not added a link to it on my blog.

# Enterprise Software Doesn't Have to Suck: Software engineer's guide to getting started with data science

If you know how to code and want to learn data science, this post has some enlightening material.

# Data Science Links from Recent Days

- Computer scientists discover statistics and find it useful – Ever wonder why computer scientists are getting all the attention for data science? Well, computer scientists stole ideas from statistics. Read the article and it will make more sense.
- Top 3 Myths About Data Science – Here is a highlight of the myths:
- Data science is a field for mathematical geeks.
- Learning a tool is the equivalent of learning data science
- Data scientists will be replaced by artificial intelligence soon

- The Big Data Fallacy And Why We Need To Collect Even Bigger Data – More Data is not always better because it does not necessarily mean more information. Read this for a good description of data vs. information vs. insights.
- MLbase – A distributed machine learning system, here is an academic paper about the system
- Predictive Analytics and Machine Learning: An Overview (PDF) – this is a very nice slide deck from IBM

# Learn R for Free at Code School

Code School is offering a course title **Try R**. The course is completely free and can be completed online with the interactive tutorial. You will learn by doing. If you have been looking to learn R or need a quick refresher, this is probably a very good option.

# Why Data Science Is Hard To Teach At School

Cathy O’Neil(on the right) provides some great details about why it is difficult to teach data science in college. She also mentions that some of the best people to lead data science programs are probably not publishing papers. They are working in industry.

# Free Bayesian and Machine Learning Textbook

David Barber, Computer Science Professor at University College London, is still offering his textbook, Bayesian Reasoning and Machine Learning, for free. This text looks quite extensive. The website also includes matlab code for many of the algorithms in the book.

# Free Bayesian Statistics Textbook

Think Bayes by Allen B. Downey is another free book available from Green Tree Press. Allen B. Downey is a computer science professor at Olin College. The book is currently available in PDF or HTML. The book is not yet complete, so it may contain some errors.