Tag Archives: R

Learn to Analyze Big Data with R – Free Course

R is a hugely popular language among data scientists and statisticians. One of the difficulties with open-source R is the memory constraint. All the data needs to be loaded into a data.frame. Microsoft solves this problem with the RevoScaleR package of the Microsoft R Server. Just launched this week is an EdX course on
Analyzing Big Data with Microsoft R Server.

According the syllabus:

Upon completion, you will know how to use R for big-data problems.

Full Disclosure: I work at Microsoft, and the course instructor, Seth Mottaghinejad, is one of my colleagues.

Advertisements

Free Stats book for Computer Scientists

Professor Norm Matloff from the University of California, Davis has published From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science which is an open textbook. It approaches statistics from a computer science perspective. Dr. Matloff has been both a professor of statistics and computer science so he is well suited to write such a textbook. This would a good choice of a textbook for a statistics course targeted at primarily computer scientists. It uses the R programming language. The book starts by building the foundations of probability before entering statistics.

Introduction to Microsoft R Open (Webinar)

Tomorrow, January 28, 2016, David Smith will present a webinar titled Introduction to Microsoft R Open. David is the R Community Lead at Microsoft. The webinar will discuss:

  • Introduction to R
  • History of R
  • Enhancements of Microsoft R Open (Microsoft’s enhanced distribution of open-source R)
  • CRAN Time Machine
  • Reproducible Data Analysis

If you are looking to get started with R or get more from R, this webinar will be worth your time.

Plus, the webinar is the first in a series of Microsoft webinars focused on R.


Full Disclosure: I work for Microsoft, and I will be helping (in a very minimal capacity) with the webinar.

Data Science Wars: R vs. Python

The great team over at DataCamp, an online site for learning R , has put together another wonderful infographic. This time, the topic is Data Science Wars (R versus Python). This has been a rather hot topic for quite some time. I even wrote about the debate back in 2013, R vs Python, The Great Debate.

DataCamp did an amazing job packing information into the infographic. Honestly, it is impressive they were able to pack so much information into a single infographic. Some of the topics covered are:

  • History
  • Who uses the language?
  • Community
  • Purpose of the language
  • Popularity
  • And way more great stuff

Enough about the description. Have a look for yourself. It is packed with great arguments for your next “R vs Python” debate.

R vs Python for data analysis
R vs Python for data analysis

Learn Data Science Online with DataCamp

If you are looking to get started in the field of data science in 2014, then DataCamp just might be the site for you. DataCamp, formerly DataMind provides a tutorial for interactive data analysis in the browser. The data analysis is taught using R.

The DataCamp platform provides:

  1. Courses to learn data science
  2. A Platform to create new courses

If you are familiar with Codecademy, DataCamp follows very much the same model except for data analysis instead of programming. This is definitely a site to watch in 2014.

The Interactive Data Analysis Tutorial

DataCamp  tutorial
Interactive Data Analysis Tutorial

DataCamp Profile

DataCamp profile
DataCamp profile

R vs Python, The Great Debate

Recently I have seen blogs/articles claiming Python is the best choice for data science and R is the new language for business. Honestly, both articles are truthful and good. Both Python and R are good. Why do we have to choose? Let’s use both.

Here is my opinion. I prefer R to Python when performing exploratory data analysis. R has so many packages for every possible statistical technique. The plots, although not beautiful by default, are quick and easy to create. However, I prefer Python when I need to pull data from an API or build a software system or website. Python is more than just a statistical analysis tool; it is a complete programming language. I might even end up using Java for a project in the near future.

There does not have to be a clear winner or one single language to use. Use the best tool for the job and get on with your data science. In the end, the world cares more what you produced not whether you used R or Python or something else.