Category Archives: Learn Data Science

This is a category for all things related to learning data science.

I think the information is interesting, but I also think the charts do a good job of telling the story.

The Bayesian Observer

The average age of first time mothers in the developed countries of the world has been rising for the last ~40 years.

Here is another plot that shows the rate of occurrence of Down Syndrome, a chromosomal defect, as a function of the age of the mother at the time of child birth.

The curve really starts to shoot up at 30. In the UK, the average age of a first time mother is 30 years. It is well known that the fertility rate in women decreases after the age of 30 and drops rapidly after 35. Older mothers are likely to find it harder to have a baby and if they do, then they run a higher risk of chromosomal defects. Given the possibilities of all these negative consequences, the increase in the average age is a bit disturbing. It seems like there is a hidden cost to more women…

View original post 232 more words


Explain Data Science to Anyone

When telling friends and family that I blog about data science, I am frequently asked to explain more. I usually respond with an answer similar to this:

You know the world is generating huge amounts of data everyday due to financial transactions, medical records, social networks, and other internet uses. Data Science aims to make better decisions based upon that data. Here are some possibilities. What type of people buy TVs in October? Which patients will get better with this new drug? Who are some other people that you probably already know?

Data Science is all about answering these types of questions with real data instead of assumptions.

I think this explanation could use some refinement. What am I leaving out? What should I remove? How do you explain data science to other people (preferably non-technical or non-data people)?

Data Science Links from Recent Days

Data and Innovation Video

Jeff Hammerbacher, founder and Chief Scientist of Cloudera, gives a nice talk about data science. He explains what he has done in the past, and what he plans to do in the future.

It is the second video, I have posted recently, emphasizing the importance of data science for more than just advertising. Jeff is getting involved in a Medical School to see how data can help.

Note: The video is about 45 minutes, but it contains some really good information.

Big Data Education

I recently read, Big Data Education: 3 Steps Universities must take

Here are the 3 steps listed:

  1. Data Science cannot be an undergraduate degree
  2. A graduate degree should contain math, stats and computer science
  3. Research

Step 2 seems obvious. Math, stats, and computer science are some of the key areas for data science. I would add communication and presentation skills to the list because people with just math, stats, and CS skills are not known to be naturally good communicators. I agree with step 3. More research needs to be done, but most of the research will need to be interdisiplinary. Universities need to put more effort into interdisiplinary research.

Step 1 confused me a bit. The argument was data science has too many necessary skills and an applied focus area. Of course a person cannot learn everything about data science in an undergraduate degree. Earning a computer science degree does not mean you will know everything about computer science. It just means you know the fundamentals about algorithms, architecture, and operating systems. You know enough about computer science to understand the field and learn more as you go. I think 4 years should be enough time to do the same for data science.

What are your thoughts?