The Cognitive Computation Group at The University of Illinois has a number of Natural Language Processing (NLP) demos. They are fun to browse. They are all based on doing interesting things with plain text.
Beyond the title, no more explanation is needed.
Data analysis is performed in many different fields and on many different types of data. Most fields call it something different. The following list comes straight from Jeff Leek’s Data Analysis Coursera class.
Name of Data Analysis by Data Type
- Biostatistics for medical data
- Data Science for data from web analytics
- Machine learning for data in computer science/computer vision
- Natural language processing for data from texts
- Signal processing for data from electrical signals
- Business analytics for data on customers
- Econometrics for economic data
The type of analysis is very similar for all fields, but what separates data science and machine learning from the others is the 3 V’s of big data. Data science and machine learning deal with a greater Volume of data, Variety of data, and Velocity (speed at which new data appears) of data. Because it is becoming cheaper and easier to store massive amounts of data than ever before, I think the other fields are beginning to realize the potential in big data. Signal processing is definitely becoming an area with big data, due to the fact that electrical sensors are everywhere.
What are your thoughts? Do you see any real differences in the data analysis performed for the data types above?
Electronic Doctor Visit
I recently received a message from one of the local hospitals. It stated that I can now have an electronic visit with my doctor. Here is how I understand it works. I fill out a brief questionnaire explaining some of my symptoms and submit it online. Within one day, my doctor will review my submission and respond. Obviously, this electronic visit should only be used for minor medical issues such as a common cold or a prescription update.
Being the type of person I am, I initially questioned why the hospital was really doing this. Sure the hospital will be able to help more patients and make more money, but is there something more?
Think of the data that is collected in this process: a patient entered description of the symptoms and the doctors diagnosis. It appears the hospital is building a training set of data with description of symptoms and a diagnosis. It is a very short step to apply a machine learning algorithm or two and totally automate the process. Maybe this is already done and my doctor just signs off on the result.
Here is how envision the system working:
- Use some natural language processing to identify the symptoms
- Match the symptoms to some known illness via machine learning
- Report the diagnosis and treatment
- Prescribe medicine if necessary
What Do You Think?
How do you feel about this process? I am sure there are some companies working on just this problem. Who are those companies?
Note: Yes, I know this data is currently collected by hospitals, but a human (nurse or doctor) interprets what another human is saying before entering the data. The electronic visit just made me realize how easy it would be to automate a doctor’s job for common problems.
The Coursera Natural Language Processing course officially starts today. Sign up and start learning.
Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.
- Probabilistic Graphical Models – combining ideas from statistics, probability, and computer science to solve really hard problems…sounds neat
- Natural Language Processing (NLP) – performing algorithms on human language and strings
If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes. If you are new to computer science and programming, CS 101 would be a good choice. If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.
Actually, the courses are no longer being offered by just Stanford. A few others schools have been added. The courses are now being offered through Coursera. Plus all the courses are free.