NIST defines Big Data and Data Science

The National Institute of Standards and Technology (NIST) is attempting to create standards for Big Data. They just released the NIST Big Data interoperability framework, which is a huge set of documents aimed at creating standards around everything in big data from definitions to architectures.

Big Data Definitions

In case you are wondering, and I know you are, what are the definitions. The framework includes many more definitions.

Big Data consists of extensive datasets – primarily in the characteristics of volume, variety, velocity, and/or variability – that require a scalable architecture for efficient storage, manipulation, and analysis.

Data science is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process.

Don’t like the definitions? Great, NIST would love to hear your opinions/comments. Comments are being collected until May 21, 2015.

The NIST Big Data interoperability framework is a massive work consisting of 7 volumes. All are open for comments.

  1. Definitions
  2. Taxonomies
  3. Use Case & Requirements
  4. Security and Privacy
  5. Architectures White Paper Survey
  6. Reference Architecture
  7. Standards Roadmap

The process to submit a comment appears rather old-school (hint: NIST, Github might be a good place to collect comments/edits), but it is not difficult.

School without Water, Electricity or Toilets

Sound appealing? Probably not! Unfortunately, this is the sad reality for many children in Sub-Saharan Africa. Even worse, this sad reality is only for those children lucky enough to even attend school. In the world today, there are 58 million out of school children, and 43% of those children will never start attending school.

UIS LeftBehind
UIS LeftBehind

FFunction, a Montreal-based data visualization studio, and UNESCO Institute for Statistics (UIS) recently launched 2 interactive data visualizations. Both are creative and innovative ways to present information.

  • Out of School Children – Explore how gender, income, and location affect a child’s education
  • Left Behind – View how and why African girls struggle to obtain an education

For more on the topic, see my entire guest post on the DataKind blog, Data Visualization for Good – Education in Africa

Sense.io Launches Data Science Platform to the Public

Sense has launched to the public today, March 18, 2015.  Sense is an online data science platform providing you the capability to easily perform your entire data science workflow via a browser.  No need to provision new servers or install software, just click “New Project” and start your analysis.  The Sense platform includes the following features:

  • Languages: R, Python, Julia, SQL
  • Simple collaboration for teams
  • Easily Scale up or down with just the click of a mouse
  • Scheduled Tasks
  • Notifications for completion of long running tasks
  • An on-premise Enterprise version

I have been one of the early beta-testers for Sense, and  I have previously written about using the platform.  I really like the platform.  I find it easy, intuitive, and clean.  Plus, I love being able to run all my analysis with just my chromebook. So, go signup and please feel free to follow me at sense.io/ryanswanstrom as I am sure to be adding some new analysis.

Below is a video with an expanded introduction to the Sense Platform.

Open Data is Important For Cities

Ben Wellington gives an excellent Ted Talk on open data. He argues that cities need to make more of an effort to release data in a standardized and machine-readable format. This could help cities be safer and fiscally responsible. He is hoping New York City sets the standards for open data for cities. As a bonus, he is a wonderful story teller.

Free Big Data Analytics Handbook

Brian Liou from Leada was kind enough to provide a guest post about their latest handbook, The Data Analytics Handbook: Big Data Edition.

Data Analytics Handbook
Data Analytics Handbook
Have you ever wondered what the deal was behind all the hype of “big data”? Well, so did we. In 2014, data science hit peak popularity, and as graduates with degrees in statistics, business, and computer science from UC Berkeley we found ourselves with a unique skill set that was in high demand. We recognized that as recent graduates, our foundational knowledge was purely theoretical; we lacked industry experience; we also realized that we were not alone in this predicament. And so, we sought out those who could supplement our knowledge, interviewing leaders, experts, and professionals – the giants in our industry. What began as a quest for the reality behind the buzzwords of “big data” and “data science,” The Data Analytics Handbook, quickly turned into our first educational product of our startup Leada (see www.teamleada.com). Thirty plus interviews and four editions later, the handbook has been downloaded over 30,000 times by readers from all over the world In them, you’ll discover whether “big data” is overblown, what skills your portfolio companies should look for when hiring a data scientist, how leading “big data” and analytics companies interview, and which industries will be most impacted by the disruptive power of data science. We hope you enjoy reading these interviews as much as we enjoyed creating them!
Download all 4 handbooks at www.teamleada.com/handbook

Open Data Day 2015

Today, February 21, 2015 is Open Data Day.

What is it?

Around the globe, cities are hosting hackathons centered around open data. The rules are fairly open-ended as long as the event is open and uses open data.

Who is it for?

  • Designers
  • Developers
  • Statisticians
  • Librarians
  • Citizens

If you want to get involved, check the list of City’s hosting Open Data Day events.

If you are looking for some good datasets to use: try Data Sources for Cool Data Science Projects: Part 1 and Part 2.

Follow

Get every new post delivered to your Inbox.

Join 5,238 other followers

%d bloggers like this: