This is a great list of traits for the next generation of data scientists.

Big Data E-Learning Programs

The International School of Engineering is launching 2 new online programs.  Both are certificate programs and last 7 or 8 weeks. Both programs just started this week.

  1. Engineering Big Data with R and Hadoop Ecosystem  This program will cover Hadoop, R, map reduce and NoSQL
  2. Essential Predictive Analytic Techniques: What Every Aspiring Data Scientist Must Know This program aims to teach a student how to “analyze, forecast and predict using data”.

I am not familiar with the International School of Engineering, so I would love some comments about the quality of programs they offer. Thanks.

StrataRx Free Online Conference

Tomorrow, October 5, 2012, at 10am Pacific Time, O’Reilly will be hosting StrataRx. I have attended other Strata online conferences and they are good. So, if data science and personalized medicine are of interest to you, then you should signup.

How Employers will use Data Science in the Future

I recently read the article, Facebook’s Gen Y Nightmare. It is worth reading (or at least skimming). Here are the highlights of the article.

In 8 to 10 years, a fictitious company named Narrative Data will be able to perform character and personality analysis. Narrative Data will analyze peoples social media accounts and other online activity to determine things such as: productivity, effectiveness, personality type, and various other traits. Then the article goes on to mention Narrative Data is able to identify a particular person suffers from acute migraine headaches. The migraines occur a couple times a month. As a result of this finding, that particular person is not considered for a job opening.

Here are my thoughts. First, is something like this even possible? I would think yes. More importantly, how useful is this analysis? My assumption is that there are no perfect people. If you look long enough you can find flaws with anyone. So, in order to avoid the above situation, a person would have to drastically censor his/her online activity or just withdraw completely from online activities. Then Narrative Data would not be able to find any problems. This situation will lead to the same problem employers have today (not enough information). The above situation in the article is an example of too much information. Given the choice between hiring a slightly flawed person with lots of available information or hiring a person who’s flaws have not yet been revealed because of a lack of information, I would choose the slightly flawed person.

If a company like Narrative Data ever does exist, it is almost certain that the opposite type of companies will start to exist. By that, I mean companies will be created with the intent of helping people hide character traits. That messes up all the analysis.

What are your thoughts? Will something like this exist? Would it really be beneficial?

A very nice visualization post from the Columbia Data Science class. The Venn Diagram at the bottom is good (by “good” I mean funny because it makes no sense).