2 Recently Released Open Source Graph-Related Projects

  1. GraphBuilder

    Intel Labs built a tool for constructing mathematical graphs out of large datasets. It is Java based and works with Hadoop and MapReduce. Intel has release a whitepaper explaining more about GraphBuilder. The code is available on Github. A big thanks to Mark Nickel for pointing out this project.

  2. ArangoDB

    ArangoDB is a flexible NoSQL database. It is a document database with the ability to add edges. Thus it can become a graph database. I had a fun time playing around with the online tutorial and demo. ArangoDB also claims to support being a key/value store. The code is available on Github.

What Makes a Good Data Scientist?

This is a very quick and informative video about data science. What is data science? What makes a good data scientist?

DJ Patil does an excellent job answering both those questions.

Here are his answers for what makes a good data scientist:

  1. Story Telling
  2. Curiousity

I think the information is interesting, but I also think the charts do a good job of telling the story.

Explain Data Science to Anyone

When telling friends and family that I blog about data science, I am frequently asked to explain more. I usually respond with an answer similar to this:

You know the world is generating huge amounts of data everyday due to financial transactions, medical records, social networks, and other internet uses. Data Science aims to make better decisions based upon that data. Here are some possibilities. What type of people buy TVs in October? Which patients will get better with this new drug? Who are some other people that you probably already know?

Data Science is all about answering these types of questions with real data instead of assumptions.

I think this explanation could use some refinement. What am I leaving out? What should I remove? How do you explain data science to other people (preferably non-technical or non-data people)?

This is a nice graphic showing where data science is being taught. It appears that data science is being taught all over the country.

Data Science Links from Recent Days

Data and Innovation Video

Jeff Hammerbacher, founder and Chief Scientist of Cloudera, gives a nice talk about data science. He explains what he has done in the past, and what he plans to do in the future.

It is the second video, I have posted recently, emphasizing the importance of data science for more than just advertising. Jeff is getting involved in a Medical School to see how data can help.

Note: The video is about 45 minutes, but it contains some really good information.

Learn R for Free at Code School

Code School is offering a course title Try R. The course is completely free and can be completed online with the interactive tutorial. You will learn by doing. If you have been looking to learn R or need a quick refresher, this is probably a very good option.