Tag Archives: python

Data Science Wars: R vs. Python

The great team over at DataCamp, an online site for learning R , has put together another wonderful infographic. This time, the topic is Data Science Wars (R versus Python). This has been a rather hot topic for quite some time. I even wrote about the debate back in 2013, R vs Python, The Great Debate.

DataCamp did an amazing job packing information into the infographic. Honestly, it is impressive they were able to pack so much information into a single infographic. Some of the topics covered are:

  • History
  • Who uses the language?
  • Community
  • Purpose of the language
  • Popularity
  • And way more great stuff

Enough about the description. Have a look for yourself. It is packed with great arguments for your next “R vs Python” debate.

R vs Python for data analysis
R vs Python for data analysis

DataQuest – Free Browser-based Learning for Data Science

DataQuest is a recently launched online data science learning platform for python. The site consists of a gamified series of missions that increase in difficulty as your skills progress. Here are a few other features of the site.

  • Sample Code
  • Live, Interactive Browser-based Coding Environment
  • Step by Step Instructions
  • Instant Feedback
  • Helpful Forums for Q&A

The site is still under development and the founder, Vik Paruchuri, is looking for help developing more content and missions for the site. If that is something of interest to you, get in touch with Vik via the DataQuest website.

Using Python to Collect data from the Twitter API

Sense Create Twitter Data

Have you ever wished you could create your own dataset from the vast data available from Twitter?

Well, wish no longer. I used the Sense Platform to Create a Dataset of Users from the Twitter API. Feel free to use this example to create your own datasets. The next great thing about Sense is: You can not only collect your data with Sense, but you can also use R or python to do your data analysis. The analysis will have to wait for another day.

Last week, I blogged about Sense: Data Science Platform of the Future. It is an excellent tool for running your data science analysis.

R vs Python, The Great Debate

Recently I have seen blogs/articles claiming Python is the best choice for data science and R is the new language for business. Honestly, both articles are truthful and good. Both Python and R are good. Why do we have to choose? Let’s use both.

Here is my opinion. I prefer R to Python when performing exploratory data analysis. R has so many packages for every possible statistical technique. The plots, although not beautiful by default, are quick and easy to create. However, I prefer Python when I need to pull data from an API or build a software system or website. Python is more than just a statistical analysis tool; it is a complete programming language. I might even end up using Java for a project in the near future.

There does not have to be a clear winner or one single language to use. Use the best tool for the job and get on with your data science. In the end, the world cares more what you produced not whether you used R or Python or something else.

SciPy 2013 Videos

The videos from the SciPy 2013 conference are all available online.

See all the videos at the Lanyard SciPy 2013 Conference Directory.

Here are a few of my other favorite videos:

Online Regular Expression Helpers

Do you ever have a regular expression you want to verify? If so, here are 3 quick sites to help you do that.

Regular Expression Editors Online

If you are unsure what regular expressions are, see Regular-Expressions.info for a tutorial.

Probabilistic Programming and Bayesian Methods for Hackers Online Book

Probabilistic Programming and Bayesian Methods for Hackers is an open source online book. The book is developed with iPython, so it can be read in a variety of formats: web, PDF, or locally with iPython installed.

Also, contributions are welcome via the Github repository for the book (or you can email the authors).

This is the first iPython project I have really looked at, and iPython looks very promising.

Plot.ly a new online Graph Tool

Plot.ly is a new site that allows for web-based plotting of graphs. The site allows a user to upload data, create a number of plots, and even write python code to generate custom graphs. Then the site has numerous export options for the graphs as well as options for sharing the graph via socia networks.

Below is an example graph via a sharable image link.

I have not had a lot of time to play around with the site, but it looks very impressive. I think there are a lot of possibilities for Plot.ly. First, I could see it used for data analysis in the cloud. Also, I could see it used for sharing plots between researchers or for publishing extra graphs to go along with publications.

Can you think of some other uses for Plot.ly?

A Couple Good Python Resources

In just the past month, a couple of great resources for learning python have been created.

  1. Getting started with Python: Tips, Tools and Resources – If you are new to python, this is a great place to start. It contains a brief description and links to books, tutorials, and MOOCs.
  2. Getting Started With Python for Data Scientists – This focuses more on tools specifically for data science.

Combined together, the previous links should provide a person all the resources necessary to begin doing some data science with the python language.

Free Textbook and Toolkit: Natural Language Processing with Python

This is an online, HTML version of the book, Natural Language Processing with Python. The book is a companion for NLTK which is a free, open source toolkit, written in python, for Natural Language Processing (NLP).