Tag Archives: standards

Help set the standards for a Data Scientist

The field of data science is moving fast. People are claiming to be data scientists; yet the knowledge, experience, and backgrounds of those people can be very different. Different is not bad. However, there a little standards around what exactly a data scientist is.

Sticking with this week’s theme of “What is a Data Scientist”, an organization titled, Initiative for Analytics and Data Science Standards¬†(IADSS) has kicked-off a research study at global scale. The study aims to gain insight about the analytics profession in the industry and help support the development of standards regarding analytics role definitions, required skills and career advancement paths. This will help set some industry standards which in turn could support the healthy growth of the analytics market.¬†¬†

If you want to be a part of this initiative and help collectively define industry standards, I encourage you to take part in the research. The survey takes approximately 5 minutes and answers for the survey will be kept anonymous. More details are provided at introduction pages of the survey at Data Science Industry Standards Research Survey

Currently, over 12 million users on LinkedIn claim to have data science and analytics capabilities. The field could use some standards around different roles and necessary skills.

NIST defines Big Data and Data Science

The National Institute of Standards and Technology (NIST) is attempting to create standards for Big Data. They just released the NIST Big Data interoperability framework, which is a huge set of documents aimed at creating standards around everything in big data from definitions to architectures.

Big Data Definitions

In case you are wondering, and I know you are, what are the definitions. The framework includes many more definitions.

Big Data consists of extensive datasets – primarily in the characteristics of volume, variety, velocity, and/or variability – that require a scalable architecture for efficient storage, manipulation, and analysis.

Data science is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process.

Don’t like the definitions? Great, NIST would love to hear your opinions/comments. Comments are being collected until May 21, 2015.

The NIST Big Data interoperability framework is a massive work consisting of 7 volumes. All are open for comments.

  1. Definitions
  2. Taxonomies
  3. Use Case & Requirements
  4. Security and Privacy
  5. Architectures White Paper Survey
  6. Reference Architecture
  7. Standards Roadmap

The process to submit a comment appears rather old-school (hint: NIST, Github might be a good place to collect comments/edits), but it is not difficult.