Tag Archives: data science

Big Data Revolution (An Infographic)

This infographic by Deep Blue Analytics does a nice job of explaining why there is so much excitement around bigdata.

  • The world is generating a lot of data
  • There are not enough people to analyze that data

New Data Science Journal

Springer has just release a new data science journal named EPJ Data Science. The journal is open access which means that articles are freely available online. That catch is that people whom submit articles must pay a fee for publication. Sometimes the fee will be covered by the author’s university or company. Anyhow, if you are interested in data science research, this journal is probably worth following.

Are you interested in academic journals?
Does this excite you?

Data Science And Doctor Visits

Electronic Doctor Visit

I recently received a message from one of the local hospitals. It stated that I can now have an electronic visit with my doctor. Here is how I understand it works. I fill out a brief questionnaire explaining some of my symptoms and submit it online. Within one day, my doctor will review my submission and respond. Obviously, this electronic visit should only be used for minor medical issues such as a common cold or a prescription update.


Being the type of person I am, I initially questioned why the hospital was really doing this. Sure the hospital will be able to help more patients and make more money, but is there something more?

The Data

Think of the data that is collected in this process: a patient entered description of the symptoms and the doctors diagnosis. It appears the hospital is building a training set of data with description of symptoms and a diagnosis. It is a very short step to apply a machine learning algorithm or two and totally automate the process. Maybe this is already done and my doctor just signs off on the result.

Here is how envision the system working:

  1. Use some natural language processing to identify the symptoms
  2. Match the symptoms to some known illness via machine learning
  3. Report the diagnosis and treatment
  4. Prescribe medicine if necessary

What Do You Think?

How do you feel about this process? I am sure there are some companies working on just this problem. Who are those companies?

Note: Yes, I know this data is currently collected by hospitals, but a human (nurse or doctor) interprets what another human is saying before entering the data. The electronic visit just made me realize how easy it would be to automate a doctor’s job for common problems.

New Data Science eBook – Free and Open-Source

Jeffrey M. Stanton, member of Syracuse University’s iSchool, just released an open-source ebook about data science. Obviously this book is intended to be used in the curriculum for the new Data Science Certificate Program. In particular, it will be used for two courses on analytics and visualization.

The book is available in the iTunes store or as a PDF. See the book website to get your copy.

Data From eReaders

Reading Your E-book is Reading You in the Wall Street Journal, is an excellent example of data science. Book publishers now know how much of a book readers will finish, how long they read, what book they read next, and lots of other stuff. Read the article and find out more. It also opens the door to some privacy issues.

A big thanks to Mark Nickel for sharing the article with me.

The Data Scientific Method

DJ Patil and Josh Elman, both of Greylock Partners, give an insightful talk at LeWeb London 2012. The most important part was the introduction of the Data Scientific Method.

Data Scientific Method

  1. Start with a Question
  2. Leverage your current data
  3. Create features and run tests
  4. Analyze the results and draw insights
  5. Let the data frame a conversation

How To Learn Data Science? Part 2

Yesterday, I posted about some traditional strategies to acquire data science skills. Today, I will post a nontraditional strategy.

Internet Based

There is hoards of data science information available on the internet for free. With enough personal motivation, a person could learn all the skills necessary for free (or cheap) online. Coursera is probably a great place to start. There are also other good sites such as Udacity, the Kaggle Wiki, other blogs and websites.

The problem with this approach is knowing exactly what to learn. A course in machine learning is great, but data science is more than just machine learning. How do you know what to learn? It would be really nice to have a collection of data science topics and the associated online training materials.

Would this strategy work for you?

How To Learn Data Science?

Based upon the popularity of a previous post about a certificate program from the University of Washington, it appears that many people are interested in learning the skills necessary to become a data scientist. Thus, I decided to compile a list of some of the possible learning strategies.

Traditional College Education

The most obvious path would be to study at a traditional college or university. Colleges and universities are starting to notice the demand for data science skills, and many colleges are currently offering programs to prepare someone as a data scientist. This path is safe and predictable. Do the homework, complete the courses, and get the degree or certificate. Most people are familiar with the process, and it offers few surprises. The problems here are the costs, lack of flexibility, and time involved.

Corporate Training

Companies are now starting to offer training programs for data science. EMC is leading the way in this category with their data science training program. Cloudera also offers lots of training related to hadoop and big data. Wolfram offers data science training with Mathematica. One of the problems with this category is the cost. Another problem is the companies have the tendency to teach and promote their own products. This may leave the student with numerous gaps in the full data science spectrum.

Your Thoughts?

What are you thoughts about the above approaches? What are the positives and negatives? Also, later this week I will be posting some less-traditional approaches to learning data science.

Profile Of A Data Scientist – Interview

Visit this excellent video interview with David Dietrich, creator of EMC’s data science curriculum. He talks about his experience helping people transition to becoming data scientists.
David lays out a list of 5 traits of a data scientist.

  • Quantitative
  • Technical
  • Skeptical
  • Communication and Collaboration
  • Creative and Curiosity

For a diagram of these 5 traits, see this brief writeup about the profile of a data scientist. Also, see the slides of his latest talk at EMC World 2012.

**Note: I removed the embedded video because it was set to automatically play the video

Kaggle Launches New Products

If you follow the blog, you probably know I am a big fan of Kaggle. Just last week, they announced the launch of 2 new products.

  1. Kaggle Recruit In this competition, the participants are not competing for a cash prize but rather a job interview with a specific company. Currently, Facebook is hosting the first such competition.
  2. Kaggle Prospect In this competition, the participants are trying to come up with the best question to ask. Participants are presented with various related datasets, and the goal is to find which data science question should be asked of the data. The winner gets a small cash prize, and the winning question becomes a regular kaggle competition.

What do you think? Are you excited to try out these new competitions?