Data Mining MOOC

The University of Waikato in New Zealand will be offering a free online course titled, Data Mining with Weka.

Weka is a widely-used toolkit for data mining and machine learning. The University of Waikato developed the toolkit.

Don’t wait too long to sign up, the course starts September 9, 2013.

Here is a video of the instructor of the course providing a brief overview.

7 Important Data Science Papers

It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.

Google Search

  • PageRank – This is the paper that explains the algorithm behind Google search.


  • MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
  • Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.


These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.

Machine Learning

Bonus Paper

  • Random Forests – One of the most popular machine learning techniques. It is heavily used in Kaggle competitions, even by the winners.

Are there any other papers you feel should be on the list?

Wireless Communication Without a Battery

The University of Washington is developing wireless devices that can operate without a battery. The devices operate by reflecting radio waves in the air. Although I can think of many uses for these devices, the article points out one in particular.

For example, sensors placed in a bridge could monitor the health of the concrete and steel, then send an alert if one of the sensors picks up a hairline crack.

After reading that, I was struck at the amount of data that could collected. Just think of all the bridges in your city/state/country. All this data is going to need analysis. Which alerts require immediate action? This sounds like a bigdata problem to me.

How can you imagine these devices being used?

International School of Engineering Programs Beginning Soon

I recently received the following information.

International School of Engineering is announcing their 3rd batch of live e-Learning certificate programs starting 4-Sep-2013 in “Engineering Big Data with R and Hadoop Ecosystem” and “Essentials of Applied Predictive Analytics” (

These programs helped Engineers and Managers transform into Hadoop Developers/Data Scientists, get industry certifications, revolutionize their workspace and establish exciting careers.


•Taught by experts who are Carnegie Mellon, Johns Hopkins and Stanford University’s alumni with Fortune 50 experience
•Applied and interactive classes
•Classes ranked among the top 1% and 5% of all classes in the world in piazza
•1/3rd the cost of other similar programs
•95% Success with Cloudera and EMC2

For details visit

For any queries mail us at or call us at +91 9502334561/2/3

Undergraduate Programs in Data Science

While most of the degrees on the list of Colleges with Data Science Degrees are master’s degrees, there are a few schools offering data science as an undergraduate program.

3 Top Data Scientists Change Jobs

Three of the Top Data Scientists have recently changed jobs.

Name Former Company New Company Announcement
Hilary Mason Data Scientist in Residence @ Accel Partners Techcrunch
DJ Patil Greylock Partners VP of Product @ RelateIQ Techcrunch
Monica Rogati LinkedIn VP of Data @ Jawbone Techcrunch

More Recommender Systems Resources

Last week I posted about Coursera’s Introduction to Recommender Systems course. Well, I believe it is the first MOOC on the topic, but there are other material available online.

If you know of any other great recommender systems resources online, please feel free to share them.

You Don't Need a PhD to do Data Science

Many of the top data scientists you will read about or hear speak have PhD degrees. Therefore, many people think a PhD is a requirement for becoming a data scientist. That is completely not true. There is a lot of work in the data science field that does not require a PhD. In all actuality, there is not a lot of data science work that does require a PhD.

What is a PhD and why would a person get one? A PhD degree is a research degree that usually takes between two and five years of study beyond a master’s degree. The majority of the program will be focused on researching and expanding upon a very specific topic. A PhD student will push the edge of known human knowledge.

In daily tasks, most data scientists do not go that far and do not need a PhD. Most of the necessary skills can be obtained at the bachelors or masters level. Combine that education with the amazing tools available and some experience and being a data scientist is definitely achievable.

The reasons many data scientists have PhD degrees are because of the curiosity and love for learning. Those are essential traits of both a data scientists and PhD students. However, you can be curious and love learning without attending enough school to obtain a PhD.

All of this is not to say that earning a PhD is bad. If you really love learning, thrive in the academic environment, and have the desire; then definitely go for the PhD. However, do not let a lack of a PhD stop you from doing data science.