What does a data scientist do?

This is one of the better descriptions, I have seen, for what a data scientist does.

They must find interesting, novel, and useful insights about the real world in the data. And they must turn those insights into products and services, and deliver those products and services at a profit.

Notice, data scientists don’t just need to find insights in data. They also need create profitable products from that insight. I often times feel that data products are not seen as important as improving the machine learning algorithms, but the data products really are the end goal.

The quote came from the Harvard Business Review article, To Work with Data, You Need a Lab and a Factory.

3 thoughts on “What does a data scientist do?”

  1. Hi – Thx for sharing. Interesting description, but I would comment that the “at a profit” part doesn’t need to be true. I can fully imagine data scientists developing products or services for, say, a non-profit organization or as a result of a hackathon.

    Rather I would argue that when a product/service is delivered is needs to be in an end-user friendly way. Granted, depending on whom the end-user is, the ‘level of friendliness’ can differ (e.g., a developer may be more than happy with an API, while a less technical end-user might look for an easy, graphical user interface).

    So, maybe the last part of the last sentence should be something like “… deliver those products and services in a usable way”. Not sure whether the word ‘usable’ will cut it, but I hope you get the idea ๐Ÿ™‚

    Best Regards,

    1. That is a very good point. I do like the term “usable” but I would not limit it to an end-user. I could see the output of a data product being the input for some other product. Thus, no end-user even know about it. Anyhow, I totally agree that “at a profit” is probably not correct.

      Thanks for the enlightening comment.

      1. Hi Ryan,

        Your comment about the ‘end user’ is spot on! As a matter of fact, I believe that most of the (big) data which is used/going to be used/consumed will actually be consumed by a system/machine, rather than by an end user, being a human being. Especially given the world we are living in or going to live in, where meshed eco systems communicate with each other by means of API sets and data feeds. Think about the Internet of Things, for instance.

        In that regard, the word ‘usable’ becomes even more important: usable from an end user perspective (at the end of the day we want to DO something useful with the data, otherwise who cares?) and usable as output/input vectors between, say, machines/systems.

        You might even argue that the term ‘end user’ can have multiple meanings, depending on the context. Meaning: an end user can be a human being (and that’s what most people consider an end user, I think). But it can also be a system or machine. If Machine A provides a Data Feed FA to Machine B, than Machine B is an end user of FA. And Machine B can process FA and provide another Data Feed FB, going to another machine (which becomes the end user of FB) or display the results in, say, a graph for a human end user to read.

        In all these cases, the data has to be usable for all the end users involved, depending on the perspective/need of the particular end user (i.e., human end user: readable & understandable graphs, reports, … and machines: predefined/agreed upon data formats and API sets).

        Anyway, I am digressing ๐Ÿ™‚


Leave a Reply

Your email address will not be published. Required fields are marked *