The Data Science Industry: Who Does What

The fine folks at DataCamp, a great site for learning data science right in your browser, have come up with another great infographic. This time it compares some of the many job titles in the data science field.

The infographic lays out the roles and skills needed for the following job titles. Note: not all the job roles can be confused with a data scientist, but all the roles can be important when completing an entire data science project.

  • Data Scientist
  • Data Analyst
  • Data Architect
  • Data Engineer
  • Statistician
  • Database Administrator
  • Business Analyst
  • Data & Analytics Manager
The Data Science Industry: Who Does What
The Data Science Industry: Who Does What

Want a Quick Jupyter Notebook?

If you have been hearing about Jupyter (formerly iPython) and have not tried it out, here are a couple quick, free, and easy options for giving it a try. No installation need, and no account setup. Just visit a link.

Easy Jupyter Notebook

Try Jupyter and tmpnb are two projects for instantly getting a jupyter notebook with just a simple URL. Tmpnb was created by Rackspace for Nature and Try Jupyter is a demo from the main Jupyter website. I believe both projects use the same open source code found on GitHub. They might even be 2 URLs to the same infrastructure.

The major limitation is the lack of an ability to comeback to your notebook later (which is not a problem if you host the Jupyter notebook on your own). The notebooks die after some time of inactivity, but you can always create a new one. For more on the design decisions, see the Rackspace blog post, How did we serve more than 20,000 IPython notebooks for Nature readers? or join the open source project on GitHub.

If you have been wishing to try our Jupyter, it cannot get much easier than these options.

jupyter notebook
jupyter notebook

Dat – Version Controlled Data

Dat is an open source project focusing on data storage. In particular, the project wants to version control data. What is version control? In short it allows for tracking of history associated with something (typically source code files or documents). Dat takes the idea a bit further, and the data is versioned at the row level and not the file level. Plus, it is built for collaboration among teams.

Use the online tutorial to learn more.

Dat is currently in beta. This is going to be a very interesting project to watch. I can see many great use cases.

Analytics vs Data Science

The lines between analytics and data science can definitely be very blurry. Different companies might call the same position by two different names, but at their core, they do have some differences.
Below is an infographic from the faculty of the Online MS in Analytics at American University. I think the infographic is accurate.

In my opinion, a true data scientist should spend more time creating and programming new algorithms while a business analyst should spend more time applying existing algorithms.

A couple of notes
  1. Years of Education are not much different, but the academic disciplines are very different. Data Scientists tend to have degrees with more rigorous mathematical training. For me, this is the biggest differentiator.
  2. It appears financial institutions prefer business analysts while the government and colleges prefers data scientists
  3. Surprisingly, Business analyst jobs are projected to grow faster than data scientists (27% to 15%), not sure I totally agree with that!

Know Of Other Differences?

Please, Leave a Comment.

Brought to you by American University’s Analytics@American, a masters in business analytics

Understanding Machine Learning: From Theory to Algorithms (Free Book Download)

Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Associate Professor at the School of Computer
Science and Engineering at The Hebrew University, Israel, and
Shai Ben-David, Professor in the School of Computer Science at the
University of Waterloo, Canada. The book looks very thorough. Below is just a sampling of the topics covered.

  • Bias-Complexity Tradeoff
  • Model Selection
  • Support Vector Machines
  • Decision Trees
  • Neural Networks
  • Clustering
  • Dimensionality Reduction
  • Feature Selection and Generation
  • Advanced Theory
  • And LOTS LOTS more….

Happy Learning!

Software Engineering Podcasts for Data Science

If you are a former software engineer looking to gain some data science skills, here are a list of podcasts that will most likely interest you.

Software Engineering Daily

A nice podcast which just ran a series of podcasts about data science.

Software Engineering Radio

Great software engineering podcast, here are a couple of topics related to data science.

Enjoy some listening while you are on the train, plane, bus or car.

Learning To Be A Data Scientist