Big Data Education

I recently read, Big Data Education: 3 Steps Universities must take

Here are the 3 steps listed:

  1. Data Science cannot be an undergraduate degree
  2. A graduate degree should contain math, stats and computer science
  3. Research

Step 2 seems obvious. Math, stats, and computer science are some of the key areas for data science. I would add communication and presentation skills to the list because people with just math, stats, and CS skills are not known to be naturally good communicators. I agree with step 3. More research needs to be done, but most of the research will need to be interdisiplinary. Universities need to put more effort into interdisiplinary research.

Step 1 confused me a bit. The argument was data science has too many necessary skills and an applied focus area. Of course a person cannot learn everything about data science in an undergraduate degree. Earning a computer science degree does not mean you will know everything about computer science. It just means you know the fundamentals about algorithms, architecture, and operating systems. You know enough about computer science to understand the field and learn more as you go. I think 4 years should be enough time to do the same for data science.

What are your thoughts?

Data Scientist Google Trends

Google Trends for Data Scientist

The search tern “data scientist” has exploded in popularity in the last 18 months. Most of the interest appears to be coming from the U.S., India, and the U.K. Anyhow, I just thought the page was fun to look at.

About the Human Face of Big Data

To start, here is a nice quote from the video. The quote is from Eric Schmidt of Google.

From the dawn of civilization until 2003, humankind generated 5 exabytes of data.
Now we produce 5 exabytes of data every two days.
…and the pace is accelerating

Rick Smolan provides a good talk. He is behind The Human Face of Big Data project. I don’t have a copy of the book, but it looks really intriguing. The talk briefly explains what the book/project is all about.

Interactive Data Visualization for the Web – Free Online Textbook

Interactive Data Visualization for the Web – OFPS – O'Reilly Media is an open textbook for using D3, a javascript library, to merge the following practices.

  1. Data Visualization
  2. Interactive Design
  3. Web Development

The book is in early release and all the sections are available. You are also welcome to comment on any part of the book to help make it better.

Cleaning out an old project

While working on Data Science 201, I was cleaning out some old projects. I found this one interesting, especially since the Christmas shopping season has started.

I named it 34 and More and put it up on Heroku. All it does is query the Google Shopping API and return the results. It works best on very specific queries. Here is an example result for “samsung galaxy player”.

The code is available on Github with an MIT License so you are free to do whatever you want with the code.

Sorry this post does not have much if anything to do with data science.

3 Key Missions For Data Science

Tim Estes, CEO of Digital Reasoning, provides a challenging talk at Hadoop World Strata 2012. Tim challenges Data Scientists to quit focusing on displaying advertisements and start focusing on the following 3 missions.

  1. Security And Freedom of the World
  2. Financial Risk
  3. Health

Take 15 minutes and watch the video. Then be inspired to go change the world.

Hadoop World/Strata NYC 2012 Videos

The videos for the Hadoop World/Strata 2012 Conference in New York City are posted on Youtube. Enjoy some video viewing experience. I may be posting some of my favorites in the coming days.

Data Science: It's Not All About Business

Recently, a great comment was posted on the list of Colleges With Data Science Degrees. Currently, many of the data science related college programs are being built in the Business Department. While it is great that colleges are starting to build data science programs, data science is so much bigger than just business. This was a nice reminder that data science is used in many fields.

I thought the comment by Bernice Rogowitz said it very well. Here is a copy of the comment:

It’s not all about business!
Some of the posts align data science and analytics with business applications. It’s important to keep in mind that scientists and researchers of all stripes are using statistical approaches, optimization, clustering, outlier detection and data cleansing methods, etc., not just analysts in the business world. In fact, some of the most sophisticated models come from outside of business. And, don’t forget the importance of analytics for finding features in non-structured data, such as images, text, 3-D models and simulations, etc. The large scope for data science and analytics was recently explored in a workshop: http://www.radcliffe.harvard.edu/exploratory-seminars/new-multidisciplinary-approach-data-understanding

Four data themes to watch from Strata + Hadoop World 2012 – Strata

via Four data themes to watch from Strata + Hadoop World 2012 – Strata.

  1. In-memory data storage
  2. SQL and SQL-like tools matter
  3. The 80% rule for data preparation
  4. Asking the right question

Data Science Down The Toilet

Quick Backstory

A few weeks back one of my children flushed an unknown item (toy or something) down one of the toilets. Well, it caused some problems, and the toilet was not working properly. So, I did what any reasonable person in 2012 would do. I googled for plumbers in my area and called one of them. I happened to call Mr. Rooter Plumbing Service. They came the next day and fixed the toilet.

Great Ryan, What does this have to do with data science?

Well, the following week, I started seeing this add.
Mr Rooter advertisement
I no longer needed a plumber! The process that determines what adds I see is definitely a data product. The problem is the solution is too slow. By the time the add company has figured out what I need, I no longer need it.

This just reminded me of why data science needs to focus strongly on real-time or near real-time solutions. Our world moves quickly and the data products need to respond accordingly. Are you aware of some good work being done for real-time data science? I know it is out there.

In the end, I would recommend Mr. Rooter, but I would wait for the coupon first.