The profile of a data scientist is changing slightly as the profession becomes more solidified. Data Science 365 conducts a study to determine some of the characteristics of a “typical data scientist.” The below infographic covers a wealth of information from programming languages used to educational backgrounds to locations. It is definitely worth looking at to understand the attributes of a data scientist in 2019.
Google has recently released a Jupyter Notebook platform called Google Colaboratory. You can run Python code in a browser, share results, and save your code for later. It currently does not support R code.
The great team over at DataCamp, an online site for learning R , has put together another wonderful infographic. This time, the topic is Data Science Wars (R versus Python). This has been a rather hot topic for quite some time. I even wrote about the debate back in 2013, R vs Python, The Great Debate.
DataCamp did an amazing job packing information into the infographic. Honestly, it is impressive they were able to pack so much information into a single infographic. Some of the topics covered are:
- Who uses the language?
- Purpose of the language
- And way more great stuff
Enough about the description. Have a look for yourself. It is packed with great arguments for your next “R vs Python” debate.
DataQuest is a recently launched online data science learning platform for python. The site consists of a gamified series of missions that increase in difficulty as your skills progress. Here are a few other features of the site.
- Sample Code
- Live, Interactive Browser-based Coding Environment
- Step by Step Instructions
- Instant Feedback
- Helpful Forums for Q&A
The site is still under development and the founder, Vik Paruchuri, is looking for help developing more content and missions for the site. If that is something of interest to you, get in touch with Vik via the DataQuest website.
Have you ever wished you could create your own dataset from the vast data available from Twitter?
Well, wish no longer. I used the Sense Platform to Create a Dataset of Users from the Twitter API. Feel free to use this example to create your own datasets. The next great thing about Sense is: You can not only collect your data with Sense, but you can also use R or python to do your data analysis. The analysis will have to wait for another day.
Last week, I blogged about Sense: Data Science Platform of the Future. It is an excellent tool for running your data science analysis.
Recently I have seen blogs/articles claiming Python is the best choice for data science and R is the new language for business. Honestly, both articles are truthful and good. Both Python and R are good. Why do we have to choose? Let’s use both.
Here is my opinion. I prefer R to Python when performing exploratory data analysis. R has so many packages for every possible statistical technique. The plots, although not beautiful by default, are quick and easy to create. However, I prefer Python when I need to pull data from an API or build a software system or website. Python is more than just a statistical analysis tool; it is a complete programming language. I might even end up using Java for a project in the near future.
There does not have to be a clear winner or one single language to use. Use the best tool for the job and get on with your data science. In the end, the world cares more what you produced not whether you used R or Python or something else.
The videos from the SciPy 2013 conference are all available online.
See all the videos at the Lanyard SciPy 2013 Conference Directory.
Here are a few of my other favorite videos: