I felt the article, The 7 steps in Big Data delivery, was worth sharing. Here was my favorite quote from the article:
Most staff members charged with researching and acquiring Big Data solutions focus on the Collect and Store steps at the expense of the others.
Big Data and Data Science are more than just collecting and storing data. Read the article to find more about the 7 sevens steps outlined.
- Data Governance
This infographic by Deep Blue Analytics does a nice job of explaining why there is so much excitement around bigdata.
- The world is generating a lot of data
- There are not enough people to analyze that data
School of Data is new site aimed at teaching people how to collect, process, analyze, and visualize data. The goal is to produce data courses and award certifications. Most of the content will be focused around the online handbook, Data Wrangling Handbook. The handbook is open source and hosted at github. Currently the course materials are still under development, so others are welcome to get involved. I believe the plan is to start offering courses this fall (2012).
It will be exciting to see how this site develops. Do you plan to get involved? Would you be interested in the courses?
Update: fixed a bad link
In this 2010 Ted Talk, David McCandless provides a great example of transforming data into a story. Here are some of the data topics covered.
- Billion dollar-o-gram What is money spent on?
- Global media panic (SARS, Swine flu, asteroids, killer wasps, …), pay attention to the gap
- What country has the most soldiers (per population)?
- Nutritional Supplement data
- And More…
This video is worth the 18 minutes. Enjoy!
My favorite part of the infographic is the demographics portion. Notice the gender, age, income, and education of the users.
I previously posted about Data Science and Doctor Visits. What I would like to know.
Who is working on data science problems for medicine?
I would love to hear some answers to this. They can be individual researchers, startups, or established companies. If I get some responses, I will be sure to post a list on the blog.
Springer has just release a new data science journal named EPJ Data Science. The journal is open access which means that articles are freely available online. That catch is that people whom submit articles must pay a fee for publication. Sometimes the fee will be covered by the author’s university or company. Anyhow, if you are interested in data science research, this journal is probably worth following.
Are you interested in academic journals?
Does this excite you?
Electronic Doctor Visit
I recently received a message from one of the local hospitals. It stated that I can now have an electronic visit with my doctor. Here is how I understand it works. I fill out a brief questionnaire explaining some of my symptoms and submit it online. Within one day, my doctor will review my submission and respond. Obviously, this electronic visit should only be used for minor medical issues such as a common cold or a prescription update.
Being the type of person I am, I initially questioned why the hospital was really doing this. Sure the hospital will be able to help more patients and make more money, but is there something more?
Think of the data that is collected in this process: a patient entered description of the symptoms and the doctors diagnosis. It appears the hospital is building a training set of data with description of symptoms and a diagnosis. It is a very short step to apply a machine learning algorithm or two and totally automate the process. Maybe this is already done and my doctor just signs off on the result.
Here is how envision the system working:
- Use some natural language processing to identify the symptoms
- Match the symptoms to some known illness via machine learning
- Report the diagnosis and treatment
- Prescribe medicine if necessary
What Do You Think?
How do you feel about this process? I am sure there are some companies working on just this problem. Who are those companies?
Note: Yes, I know this data is currently collected by hospitals, but a human (nurse or doctor) interprets what another human is saying before entering the data. The electronic visit just made me realize how easy it would be to automate a doctor’s job for common problems.
Nice discussion of Hadoop and Storm and where each one is most useful.
Jeffrey M. Stanton, member of Syracuse University’s iSchool, just released an open-source ebook about data science. Obviously this book is intended to be used in the curriculum for the new Data Science Certificate Program. In particular, it will be used for two courses on analytics and visualization.
The book is available in the iTunes store or as a PDF. See the book website to get your copy.