Here is a good overview of the first week of the Columbia Data Science course.


I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting.


Rachel started by going through the syllabus. Here were her main points:

  • The prerequisites for this class are: linear algebra, basic statistics, and some programming.
  • The goals of this class are: to learn what data scientists do. and to learn to do some of those things.
  • Rachel will teach for a couple weeks, then we will have guest lectures.
  • The profiles of those speakers vary considerably, as do their backgrounds. Yet they are all data scientists.
  • We will be resourceful with readings: part of being a data scientist is realizing lots of stuff isn’t written down yet.
  • There will be 6-10 homework assignments, due every two weeks or so.
  • The final project will be an internal Kaggle competition. This will be a…

View original post 1,351 more words


Columbia University Data Science Course

I just recently (yesterday) found out that Columbia University is offering a Data Science course. Dr. Rachel Schutt of the Department of Statistics is teaching the course. She is also blogging some of the course material. Sorry, I could not find any video lectures. However, Cathy O’Neil is sitting in on the course and will be blogging some of the material. You can see more at Cathy’s popular blog titled mathbabe.

Legos and Big Data

Recently, the family and I visited a LEGO store. We were given a pamphlet that contained some interesting numbers.


  • More than 4,000,000 million people will play with LEGO bricks this year
  • There are an average of 62 LEGO bricks per person on Earth
  • 5,000,000,000 (yeah thats 5 billion) hours per year are spent playing with LEGO bricks
  • It would take 40,000,000,000 stacked LEGO bricks to reach the moon
  • 19,000,000,000 LEGO elements are made each year – that is 36,000 per minute

Now, I would not really call this big data because it is LEGO bricks not data. Here is what LEGO is missing. The ability to track how the LEGO pieces are used. Imagine if all the LEGO bricks had tiny sensors that would let LEGO know when and how 2 bricks were connected. That would be big data. It would be fun to know what pieces are most commonly connected and which ones are never connected. It would also be fun to know how the bricks are connected. Are they commonly stacked straight or staggered? Privacy issues aside, that would be some seriously fun big data!

This post contains some nice tips if you are planning some machine learning in the cloud.

The Official Blog of

In the fourth post of the series, I compared prediction functionality and performance between each of the services. We saw that while some services may make more accurate predictions than others, the runners-up often follow closely behind. That’s good news for you because it means you are free to choose among the services without being too concerned that you picked a dud when it comes to making accurate predictions. This post will cover some other miscellaneous topics that may help you choose which service will best meet your needs.


I previously hinted that I ran into stability issues with more than one service. Which ones gave me grief? Well, actually, all of the cloud-based services had problems. I was unable to rely on any of them to take my data, create a model, and then make predictions without occasionally failing. To collect the results, I often had to run experiments multiple times (without any…

View original post 1,108 more words

The next post in the BigML Machine Learning Throwdown.

The Official Blog of

In the third post of the series, we looked at the types of models supported by each service. While some are useful for understanding your data, the primary goal of many machine learning models is to make accurate predictions from unseen data. Say you want to sell your house but you don’t know how much it is worth. You have a dataset of home sales in your city for the past year. Using this data, you train a model to predict the sales price of a house based on its size and the year it was built. Will this model be useful for predicting how much your own house will sell for? In this post, I will discuss how a model’s prediction abilities are evaluated, the results of comparing models from each service, and some general observations about making predictions with each service.

As we saw in the previous post, some of…

View original post 1,227 more words

I am enjoying the honest review. The post seems unbiased, which is good.

The Official Blog of

Pop quiz: among the kids in the picture, which ones are able to jump high enough to reach the hoop to dunk a basketball?

You can’t know for sure, but you can make an educated guess that the tall kid wearing the #21 jersey can dunk while the others cannot. Your brain has a model of how the world works that allows you to look at a person and make a prediction about their ability to dunk. You may not always be correct, but your mental model allows you to make much more accurate predictions than you could by flipping a coin.

Machine learning algorithms work by detecting patterns in data and creating a model that summarizes important properties. Given a list of people, their physical characteristics, and whether or not they can dunk, these algorithms can create a model to predict whether a new person it encounters can dunk based on…

View original post 1,379 more words