Why Data Science? – Presentation

Recently, I was invited to speak about data science to the research department of a regional hospital system. I thought I would share my slides.

A clarification note on one of my quotes from the presentation,

“Data Science doesn’t need big data”

I am not trying to say big data is not important. I am just saying that lots of excellent data science can be performed on data that is not big data. So, don’t wait until you have big data before you start doing some data science.

For some reason, not all the links in the presentation are working. If you want to follow the links, go to SpeakerDeck and click “Download PDF”.

One Algorithm To Learn Anything [An Interview with Pedro Domingos, Author of The Master Algorithm]

Releasing today (Sept. 22, 2015) is the fantastic new book, The Master Algorithm, by machine learning expert and University of Washington Computer Science Professor, Pedro Domingos. Recently, I got the opportunity to visit with Dr. Domingos about his new book and machine learning in general. See below for his fears of machine learning, thoughts on education, and tips for learning data science.

Stay tuned later this week for a complete review of the book!

The Master Algorithm book
The Master Algorithm book
Why This Book and Why Now?

Dr. Domingos explains the 2 primary reasons why the topic and timing are just perfect for the book.

  1. Real Need – Currently, machine learning is a topic of interest in society. Machine learning and data science are being discussed in news and politics. The one downfall, most people don’t really understand the topic. He does fault machine learning experts for not making the topic understandable to a broader audience. Many of the concepts from machine learning can be explained without complex mathematics formulas. The new book aims to do just that while exposing the topic of machine learning to others.
  2. Unity – Dr. Domingos explains the five camps of machine learning: symbolists, connectionists, evolutionaries, bayesians, and analogizers. He thinks right now is the time to start thinking and working toward combining the camps to form a single general purpose learner. More on those camps can be discovered in the book.

What are the limits of the Master Algorithm?

Not many! Dr. Domingos does not think the algorithm will perform magic, but he did state,

“It should truly be able to learn anything given the requisite data.”

The trick will be compiling the “requisite data”.

What are the biggest fears of the Master Algorithm?

As is emphasized numerous times in the book, Dr. Domingos does not envision The Master Algorithm creating bots that will eventually take over the world. No, the real problem is already a concern with machine learning.

Computers are making decisions for humans every day, and sometimes those decisions are wrong.

Also, he thinks machine learning will discover and expose things we do not like about ourselves. Then he envisions some challenges with ownership of that data and the algorithmic results.

How soon will we see the Master Algorithm?

Dr. Domingos is not sure if the algorithm will be discovered tomorrow, not for many years, or ever. He does think the next five years will see some combining of the best parts of the five camps.

What are some problems in the application of machine learning?

He is currently seeing a problem in the practice of applying machine learning. He sees companies take the latest research, which is a good thing, and turn it into a large engineering project. Eventually, those projects hit a wall of being too complex. That is why he thinks companies are going to start combining and refining machine learning projects to make them less complex and more maintainable.

What advice would you give to high school students or undergrads about pursuing machine learning/data science?

Dr. Domingos believes they (high school and undergraduate students) are the primary audience for the book. He did expand on the answer and provide a nice todo list for people getting into the field of data science and machine learning.

  1. Read The Master Algorithm
  2. Explore further readings – the end of the book contains details on further readings for each chapter
  3. Take an online course (MOOC) – many good choices
  4. Start implementing some algorithms – either on your own projects or in a competition such as Kaggle, this will help you identify some of the common pitfalls

How do you see machine learning affecting education?

He sees two clear ways in which machine learning will have an impact on education.

  1. Machine learning is something people in every field will need to know. It is becoming the new toolkit.
  2. Machine learning is going to personalize education. MOOCs are already starting to do this, but the future shows much more promise in this specific area.

Do you ever have plans to offer the Coursera Machine Learning course in a live format?

Luckily for us learners, Dr. Domingos does plan to offer the course in a live format. He always intended the course to happen that way, but some unexpected things arose, and the class never ran live. It doesn’t have a scheduled date yet, but the details will be posted on this blog when it does happen. In the mean time, all the lectures are available on the Cousera class page.

Finally, do you have a unique use of machine learning in your own life?

Dr. Domingos and few other professors at the University of Washington are in the initial steps of a project named eProf, for electronic Professor. The goal is to automate some of the responsibilities of a professor. The project is still in the discussion stages, but he thinks it would make a useful open source project. Hopefully, more to come on eProf in the future!

Remember, check back later this week for a complete review of the book!

Silk.co’s 123 Most Influential Twitter Accounts for Data Science

Silk.co teamed up with LittleBird to create a list of the most influential “data science” accounts on Twitter. LittleBird uses an algorithm to rate an account’s influence based upon retweets from other influencers and network activity.

I was humbled to be included on the list with some very highly influential people. If you are on twitter, the list includes some great people and companies to follow. See the entire analysis at The Data Science Influencers: The Tops in Terms of “Insider Score” According to Influence Mapping Tool LittleBird.

Data Science, Robots, and Disaster Recovery

Robin Murphy has one of the coolest job titles I have ever read, Disaster Roboticist. At Texas A&M, she works on developing advanced robots for disaster recovery.

In this Ted talk, she outlines some of the capabilities of the robots and how the robots work. One quote at the end really caught my attention.

So really, “disaster robotics” is a misnomer. It’s not about the robots. It’s about the data.

The robots are collecting data and that data needs analysis!

The NFL Should Share this Data

The National Football League begins its regular season tonight. One feature you might not hear about is the addition of 2 RFID sensors on every player. Each stadium is equipped with receivers (not wide receivers) to capture the data emitted from the RFID tags. When the data is collected, it will be able to track players position, movement, speed, and acceleration. A company called Zebra Technologies is implementing the system.

It is a bit early to know exactly what the NFL teams will do with the data, but I think the NFL should open up the data. Analysis could be done for fantasy football. Data scientists could come up with some creative data visualizations. Plus, I think it contains great academic research potential.

As a side note, I am sure someone would start building some apps for the Microsoft Surface tablets.

See more at The IoT comes to the NFL

Follow Data Science 101

The Data Science 101 blog has just moved to a self-hosted WordPress. The URL and name has not changed though. Thus, if you were a WordPress.com follower, you will no longer get future emails about new posts (you should continue to see the posts in your timeline though). If you would like emails, just signup with the email form in the sidebar of the Data Science 101 blog.

Plus, you can always follow along at the following locations:

Mapping youth well-being worldwide with open data – From DataKind

Mapping youth well-being worldwide with open data – From DataKind

Once again, I was honored to write a guest post for DataKind. This time is was on the spread of open source software by data-do-gooders. A couple years ago, DataKind hosted a DataDive in Washington D.C. and some of the participants created a mapping software project titled DataTools 2.0. Since then, it has been replicated by a number of groups around the globe. Read the full post on the DataKind blog to find out more.

Data Science College Programs Across the Globe [Interactive Map]

Continuing this weeks theme on data science colleges, the nice folks at Silk.co created an interactive map of the Data Science University Programs across the globe. Click the map to view the interactive visualization.

Where are the Programs?

Global Map of Data Science Colleges
Map indicating the location of all the data science college programs

Based upon the visualization, it is easy to see most of the programs are in the United States and Western Europe.

What About Degree Types

Data Science Breakdown by Degree Type
Data Science Breakdown by Degree Type

The data can also easily be broken down by degree type and how the degree is delivered (online/on-campus).

What is Silk.co?

To state it simply, Silk.co is a place to very easily store and visualize data. It looks pretty awesome.

Learning To Be A Data Scientist