Python Number Crunching Workshop

If you are in the Washington D.C. area, Data Science DC is hosting a Workshop on Number Crunching in Python. The workshop is November 11 from 10 a.m. to 2 p.m. on the GWU campus.

Video: Big Data Summit – Training data scientists – 11 Jul 2012 – Computing News

This is a nice video discussing some good ideas when training data scientists. It does have an academic perspective.

Video: Big Data Summit – Training data scientists – 11 Jul 2012 – Computing News.

DataGotham Live Stream

Data Gotham, the New York City Data Science Conference, is live streaming the presentations today. Things are already underway, so you should be able to go to the site and start watching.

Peter Skomoroch Discusses Data Science

Peter Skomoroch, Principal Data Scientist at LinkedIn, provides a very nice interview. Peter answers some great questions about:

  • What is a data scientist?
  • transitioning from a techy person to a data scientist
  • transitioning from a math person to data scientist
  • what are the skills needed for a data scientist?

Sorry, but I could not embed the video. Here is a link to the video.

Here is a good overview of the first week of the Columbia Data Science course.

Columbia University Data Science Course

I just recently (yesterday) found out that Columbia University is offering a Data Science course. Dr. Rachel Schutt of the Department of Statistics is teaching the course. She is also blogging some of the course material. Sorry, I could not find any video lectures. However, Cathy O’Neil is sitting in on the course and will be blogging some of the material. You can see more at Cathy’s popular blog titled mathbabe.

Big Data in the (Heated or Cooled) Air Around You –

This is a nice article about using Machine Learning and BigData with the thermostat in your home. The company, Nest Labs, is doing some cool things.

Big Data in the (Heated or Cooled) Air Around You –

Legos and Big Data

Recently, the family and I visited a LEGO store. We were given a pamphlet that contained some interesting numbers.


  • More than 4,000,000 million people will play with LEGO bricks this year
  • There are an average of 62 LEGO bricks per person on Earth
  • 5,000,000,000 (yeah thats 5 billion) hours per year are spent playing with LEGO bricks
  • It would take 40,000,000,000 stacked LEGO bricks to reach the moon
  • 19,000,000,000 LEGO elements are made each year – that is 36,000 per minute

Now, I would not really call this big data because it is LEGO bricks not data. Here is what LEGO is missing. The ability to track how the LEGO pieces are used. Imagine if all the LEGO bricks had tiny sensors that would let LEGO know when and how 2 bricks were connected. That would be big data. It would be fun to know what pieces are most commonly connected and which ones are never connected. It would also be fun to know how the bricks are connected. Are they commonly stacked straight or staggered? Privacy issues aside, that would be some seriously fun big data!

This post contains some nice tips if you are planning some machine learning in the cloud.

The next post in the BigML Machine Learning Throwdown.