Help set the standards for a Data Scientist

The field of data science is moving fast. People are claiming to be data scientists; yet the knowledge, experience, and backgrounds of those people can be very different. Different is not bad. However, there a little standards around what exactly a data scientist is.

Sticking with this week’s theme of “What is a Data Scientist”, an organization titled, Initiative for Analytics and Data Science Standards¬†(IADSS) has kicked-off a research study at global scale. The study aims to gain insight about the analytics profession in the industry and help support the development of standards regarding analytics role definitions, required skills and career advancement paths. This will help set some industry standards which in turn could support the healthy growth of the analytics market.¬†¬†

If you want to be a part of this initiative and help collectively define industry standards, I encourage you to take part in the research. The survey takes approximately 5 minutes and answers for the survey will be kept anonymous. More details are provided at introduction pages of the survey at Data Science Industry Standards Research Survey

Currently, over 12 million users on LinkedIn claim to have data science and analytics capabilities. The field could use some standards around different roles and necessary skills.

4 Steps to Finding Your Data

So, you have identified a fascinating new problem to solve with data. You correctly started with a problem and not the data. It seems both beneficial and interesting. Now where do you get the data? Here are 4 steps (in order) for how to find data.

1. Existing Data

The best place to start is the data you currently have. What data does your organization currently collect? How can you get access to that? Start there.

2. OpenData

Then look for industry specific open data (data that is freely available). Many industries publish data monthly or yearly. Also, data is frequently available with government funded research. If industry specific data is not available, what other related data is openly available? It is often beneficial to augment your existing data with open data. Here are some lists of open data, Open Data, Part 1, Open Data, Part 2. There are also many others available.

3. API

Next, explore the opportunity of using an API to access data. Many application have existing API access. An API (Application Programming Interface) allows a person to write some computer code to pull machine-readable data from an existing system. Some are freely available, others have associated costs. Many allow the data to be available in near real-time. There are numerous API’s available where you can pull in data. Check with some of your existing applications. They are available for weather, stocks, news, social media, web analytics, and many more.

4. Create The Data

The last resort is to begin the creation of data. An obvious choice is to create a survey. Be careful because surveys can be trickier than initially thought. You often do not get good representation and the result is biased data. Another way to collect data is to change your application to begin collecting the desired information. You may even have to build a new application. Sometimes an entire process needs to be created or modified to include methods to collect the data. This last step usually takes the longest and costs the most money.