Tag Archives: bigdata

Data Scientist vs Data Engineer

As the field of data science continues to grow and mature, it is nice to begin seeing some distinction in the roles of a data scientist. A new job title gaining popularity is the data engineer. In this post, I lay out some of the distinctions between the 2 roles.

Data Scientist vs Data Engineer Venn Diagram
Data Scientist vs Data Engineer Venn Diagram


Data Scientist

A data scientist is responsible for pulling insights from data. It is the data scientists job to pull data, create models, create data products, and tell a story. A data scientist should typically have interactions with customers and/or executives. A data scientist should love scrubbing a dataset for more and more understanding.

The main goal of a data scientist is to produce data products and tell the stories of the data. A data scientist would typically have stronger statistics and presentation skills than a data engineer.

Data Engineer

Data Engineering is more focused on the systems that store and retrieve data. A data engineer will be responsible for building and deploying storage systems that can adequately handle the needs. Sometimes the needs are fast real-time incoming data streams. Other times the needs are massive amounts of large video files. Still other times the needs are many many reads of the data.
In other words, a data engineer needs to build systems that can handle the 3 Vs of big data.

The main goal of data engineer is to make sure the data is properly stored and available to the data scientist and others that need access. A data engineer would typically have stronger software engineering and programming skills than a data scientist.


It is too early to tell if these 2 roles will ever have a clear distinction of responsibilities, but it is nice to see a little separation of responsibilities for the mythical all-in-one data scientist. Both of these roles are important to a properly functioning data science team.

Do you see other distinctions between the roles?

International School of Engineering Programs Beginning Soon

I recently received the following information.

International School of Engineering is announcing their 3rd batch of live e-Learning certificate programs starting 4-Sep-2013 in “Engineering Big Data with R and Hadoop Ecosystem” and “Essentials of Applied Predictive Analytics” (http://goo.gl/kHckP).

These programs helped Engineers and Managers transform into Hadoop Developers/Data Scientists, get industry certifications, revolutionize their workspace and establish exciting careers.


•Taught by experts who are Carnegie Mellon, Johns Hopkins and Stanford University’s alumni with Fortune 50 experience
•Applied and interactive classes
•Classes ranked among the top 1% and 5% of all classes in the world in piazza
•1/3rd the cost of other similar programs
•95% Success with Cloudera and EMC2

For details visit http://goo.gl/bPJEF

For any queries mail us at elearning@insofe.edu.in or call us at +91 9502334561/2/3

10 Big Data Best Practices

10 Big Data Implementation Best Practices

This is a great article and list of topics to remember when working on big data projects. Here is the list.

  1. Gather business requirements before gathering data
  2. Implementing big data is a business decision not IT
  3. Use Agile and Iterative Approach to Implementation
  4. Evaluate data requirements
  5. Ease skills shortage with standards and governance
  6. Optimize knowledge transfer with a center of excellence
  7. Embrace and plan your sandbox for prototype and performance
  8. Align with the cloud operating model
  9. Associate big data with enterprise data
  10. Embed analytics and decision-making using intelligence into operational workflow/routine

See the original article, 10 Big Data Implementation Best Practices, for details.

Big Data Journal: 5 articles to highlight

The inaugural issue of Big Data was published a few weeks ago. The journal is excellent. The articles are relevant, readable, and free. In the first issue, most of the articles were not super technical (meaning there was not a lot of equations or algorithms). I would like to highlight just 5 of the articles (feel free to read the others as well).

  1. Making Sense of Big Data – A nice brief discussion of the term big data and some goals for the journal.
  2. Big Data For Development – This is an introduction to United Nations Global Pulse, an initiative to use data to better understand human well-being.
  3. Broad Data: Exploring the Emerging Web of Data – This article is all about dealing with the explosion of open data becoming available.
  4. Data Science and Its Relationship to Big Data and Data-Driven Decision Making – The title is pretty self-explanatory. The article points out 7 fundamental concepts of data science.
  5. Educating the Next Generation of Data Scientists – This is a roundtable discussion all about data science and data science education.