This is a nice article about using Machine Learning and BigData with the thermostat in your home. The company, Nest Labs, is doing some cool things.
The University of California at Berkeley is hosting AMP Camp, the Big Data Bootcamp, starting today. The conference is sold out for in-person attendace, but registration is free and live streaming is available. The agenda looks good (including machine learning, parallel programming, Mesos, and hands-on exercises), so this might be a good opportunity for some learning.
Thanks to Mark Nickel for the link.
During the Spring 2012, Alex Smola taught a course at Berkeley on Scalable Machine Learning. Alex is an Adjunct Professor at the University of California at Berkeley and a Visiting Scientist at Google.
Alex was kind enough to put all the course materials on the internet. That includes papers, slides, links, and video lectures. Like the title suggests, the course appears to focus on large-scale machine learning. Below is one of the lectures from the Statistics portion of the course.
Electronic Doctor Visit
I recently received a message from one of the local hospitals. It stated that I can now have an electronic visit with my doctor. Here is how I understand it works. I fill out a brief questionnaire explaining some of my symptoms and submit it online. Within one day, my doctor will review my submission and respond. Obviously, this electronic visit should only be used for minor medical issues such as a common cold or a prescription update.
Being the type of person I am, I initially questioned why the hospital was really doing this. Sure the hospital will be able to help more patients and make more money, but is there something more?
Think of the data that is collected in this process: a patient entered description of the symptoms and the doctors diagnosis. It appears the hospital is building a training set of data with description of symptoms and a diagnosis. It is a very short step to apply a machine learning algorithm or two and totally automate the process. Maybe this is already done and my doctor just signs off on the result.
Here is how envision the system working:
- Use some natural language processing to identify the symptoms
- Match the symptoms to some known illness via machine learning
- Report the diagnosis and treatment
- Prescribe medicine if necessary
What Do You Think?
How do you feel about this process? I am sure there are some companies working on just this problem. Who are those companies?
Note: Yes, I know this data is currently collected by hospitals, but a human (nurse or doctor) interprets what another human is saying before entering the data. The electronic visit just made me realize how easy it would be to automate a doctor’s job for common problems.
Hilary Mason provides another great talk title: Machine Learning for Hackers. The video is worth watching. Enjoy!
Since recently announcing $16M in funding, Coursera has been making quite a bit of noise. Last fall, Stanford University decided to freely offer a couple computer science classes online. The response was huge, and that led to the creation of Coursera.
The courses are no longer limited to computer science, and Stanford is no longer the only school involved. Here is a list of academic areas being offered and another list with the schools involved.
- Healthcare, Medicine, and Biology
- Economics, Finance, and Business
- Humanities and Social Sciences
- Mathematics and Statistics
- Computer Science
- Society, Networks, and Information
Although, not all of the courses will be directly related to data science, many of them are very close. Naturally Math, Statistics, and Computer Science areas have direct relations to data science. However, some of the other areas such as Networks, Biology, and Economics are some of the most popular application areas for data science. This is very exciting. My only concern is that the courses are a bit too much like traditional university courses with specific start/end dates and homework due dates. It will be interesting to see if the course structures change over time.
Anyhow, the following courses are starting today. Signup and start learning.
- Machine Learning – A major focus area of data science
- Computer Science 101 – probably a good starting point if you don’t know how to program
- Compilers – good for understanding how programming languages work
- Automata – hard to explain in 1 line, but it contains some fundamental principles in computer science
- Intro to Logic – learn to reason systematically
- Computer Vision – not sure of the relation to data science, but I am sure there is one, if you know, please leave a comment
Are you going to enroll in any of these courses?
The above article provides a nice brief overview of 5 clustering algorithms.
- Hierarchical Clustering
- Fuzzy C-Means
- Multi-Gaussian with Expectation-Maximization
- Density-based Cluster
This goes well with a previous post about 6 Machine Learning Algorithms.
This is not intended to be mapped to a set of college courses. It is intended to be a listing of necessary skills for a data scientist. For a definition of data scientist, see this previous post.
- Calculus – not directly important to data science, but the knowledge is important to understand the statistics and machine learning
- Matrix Operations
- Regression – Linear and Logistic
- Bayesian Statistics
- R – stats
- Octave – machine learning
- Basic Programming – Java, C/C++, and Python seem to be good language choices
- Machine Learning
- Database Knowledge – not limited to just relational databases
- Data Visualization – how to make data look good: maps, graphs, etc
- Presentation – story telling, be comfortable explaining data to others
Do you have anything to add/remove from the list?
This posts provides a nice quick overview of 6 machine learning algorithms.
- Decision Trees
- Linear Regression
- Neural Networks
- Bayesian Networks
- Support Vector Machines (SVMs)
- Nearest Neighbor