Recently, I was invited to speak about data science to the research department of a regional hospital system. I thought I would share my slides.
A clarification note on one of my quotes from the presentation,
“Data Science doesn’t need big data”
I am not trying to say big data is not important. I am just saying that lots of excellent data science can be performed on data that is not big data. So, don’t wait until you have big data before you start doing some data science.
Releasing today (Sept. 22, 2015) is the fantastic new book, The Master Algorithm, by machine learning expert and University of Washington Computer Science Professor, Pedro Domingos. Recently, I got the opportunity to visit with Dr. Domingos about his new book and machine learning in general. See below for his fears of machine learning, thoughts on education, and tips for learning data science.
Stay tuned later this week for a complete review of the book!
Why This Book and Why Now?
Dr. Domingos explains the 2 primary reasons why the topic and timing are just perfect for the book.
Real Need – Currently, machine learning is a topic of interest in society. Machine learning and data science are being discussed in news and politics. The one downfall, most people don’t really understand the topic. He does fault machine learning experts for not making the topic understandable to a broader audience. Many of the concepts from machine learning can be explained without complex mathematics formulas. The new book aims to do just that while exposing the topic of machine learning to others.
Unity – Dr. Domingos explains the five camps of machine learning: symbolists, connectionists, evolutionaries, bayesians, and analogizers. He thinks right now is the time to start thinking and working toward combining the camps to form a single general purpose learner. More on those camps can be discovered in the book.
What are the limits of the Master Algorithm?
Not many! Dr. Domingos does not think the algorithm will perform magic, but he did state,
“It should truly be able to learn anything given the requisite data.”
The trick will be compiling the “requisite data”.
What are the biggest fears of the Master Algorithm?
As is emphasized numerous times in the book, Dr. Domingos does not envision The Master Algorithm creating bots that will eventually take over the world. No, the real problem is already a concern with machine learning.
Computers are making decisions for humans every day, and sometimes those decisions are wrong.
Also, he thinks machine learning will discover and expose things we do not like about ourselves. Then he envisions some challenges with ownership of that data and the algorithmic results.
How soon will we see the Master Algorithm?
Dr. Domingos is not sure if the algorithm will be discovered tomorrow, not for many years, or ever. He does think the next five years will see some combining of the best parts of the five camps.
What are some problems in the application of machine learning?
He is currently seeing a problem in the practice of applying machine learning. He sees companies take the latest research, which is a good thing, and turn it into a large engineering project. Eventually, those projects hit a wall of being too complex. That is why he thinks companies are going to start combining and refining machine learning projects to make them less complex and more maintainable.
What advice would you give to high school students or undergrads about pursuing machine learning/data science?
Dr. Domingos believes they (high school and undergraduate students) are the primary audience for the book. He did expand on the answer and provide a nice todo list for people getting into the field of data science and machine learning.
Luckily for us learners, Dr. Domingos does plan to offer the course in a live format. He always intended the course to happen that way, but some unexpected things arose, and the class never ran live. It doesn’t have a scheduled date yet, but the details will be posted on this blog when it does happen. In the mean time, all the lectures are available on the Cousera class page.
Finally, do you have a unique use of machine learning in your own life?
Dr. Domingos and few other professors at the University of Washington are in the initial steps of a project named eProf, for electronic Professor. The goal is to automate some of the responsibilities of a professor. The project is still in the discussion stages, but he thinks it would make a useful open source project. Hopefully, more to come on eProf in the future!
Remember, check back later this week for a complete review of the book!
The National Football League begins its regular season tonight. One feature you might not hear about is the addition of 2 RFID sensors on every player. Each stadium is equipped with receivers (not wide receivers) to capture the data emitted from the RFID tags. When the data is collected, it will be able to track players position, movement, speed, and acceleration. A company called Zebra Technologies is implementing the system.
It is a bit early to know exactly what the NFL teams will do with the data, but I think the NFL should open up the data. Analysis could be done for fantasy football. Data scientists could come up with some creative data visualizations. Plus, I think it contains great academic research potential.
The Data Science 101 blog has just moved to a self-hosted WordPress. The URL and name has not changed though. Thus, if you were a WordPress.com follower, you will no longer get future emails about new posts (you should continue to see the posts in your timeline though). If you would like emails, just signup with the email form in the sidebar of the Data Science 101 blog.
Plus, you can always follow along at the following locations:
Once again, I was honored to write a guest post for DataKind. This time is was on the spread of open source software by data-do-gooders. A couple years ago, DataKind hosted a DataDive in Washington D.C. and some of the participants created a mapping software project titled DataTools 2.0. Since then, it has been replicated by a number of groups around the globe. Read the full post on the DataKind blog to find out more.