Category Archives: Learn Data Science

This is a category for all things related to learning data science.

Deep Learning Coursera Specialization

Andrew Ng, co-founder of Coursera and Deep Learning Expert, is launching a new specialization on Coursera. Details can be found at DeepLearning.ai or the Deep Learning Specialization Page. The specialization consists of 5 courses. They are free to audit and watch the videos. There is a fee to get graded assignments and receive a certificate of completion. The first course just started this week, so it is great time to start learning some deep learning.

Good Luck!

A Data Science Career with Kirk Borne, Free Webinar

Once again, The Data Incubator, is hosting another Data Science in 30 minutes webinar. This one features the career of Kirk Borne.

Renowned data scientist, Kirk Borne will take viewers on a journey through his career in science and technology explaining how the industry-and himself have evolved over the last 4 decades. Starting with skipping lunches in high school to a systematic twitter obsession, Kirk will shed light on his road to success in the data science industry.

Kirk is universally considered one of the most (if not the most) influential voices in data science. If you are interested in a career in data science, this is a webinar you will not want to miss.

The webinar is 5:30 Eastern Time on August 29, 2017, and registrations are currently being accepted. It is free.

The 3 Stages of Data Science

Businesses everywhere are racing to extract meaningful insight from their data. Many organizations are spinning up data science teams and attacking problems (some more successful than others). However, one of the challenges is determining the current stage of data science within the organization. Next is determining the desired stage of data science.

Below are 3 stages of a truly mature data science organization.

1. Dashboards

The beginning stage of data science is dashboards. It is all about answering “How much?” and “What happened?” by looking at reports of historical data. If done well, it might even help an organization answer “Why”. Many organizations will refer to this phase as Business Intelligence.

The dashboard stage can be very expensive for an organization, in terms of people-hours and money. It usually involves investments in:

  1. Data Warehouse or some other storage environment, for storing the data in a single location for easy reporting
  2. ETL (Extract Transform Load) Tools for manipulating, combining, and moving data to the data warehouse
  3. Reporting Tools for displaying the results and allowing users to “explore” the data

Here are some common questions that can be answered via traditional dashboards:

  • How many customers live in each region?
  • What were the sales on Black Friday?
  • How many patients visited the hospital last month?

As you can see, there are large amounts of value that can be gained by this phase alone. It is critical for a business to clearly understand past performance. Unfortunately, this phase is where many businesses stop.

2. Machine Learning

The real “science” of data science does not begin until the second stage which is machine learning. It focuses on estimating quantities that cannot be directly observed. This could be what movies a customer will like, the price of a company’s stock tomorrow, or the causal effect of a particular advertising campaign. Machine Learning uses the data from the first phase and applies statistical or other methods to gain additional insights.

Think of machine learning as answering the following:

  • When a customer moves, will he/she spend money at a hardware store?
  • When a credit card purchase is made, what is the probability the charge was fraudulent?
  • What is the expected lifetime value of a new customer?
  • If a hurricane is coming, what will people buy? (pop tarts? it is true).

Notice the connection between an event and some outcome. The value of machine learning comes from estimating the causal outcome of potential events. This phase is filled with terms such as: machine learning, data mining, and statistical modeling. The machine learning stage is all about looking into the future!

3. Actions

Determining the actions to perform, is the third and final phase. It tries to capitalize on the results of machine learning in order to take appropriate actions. The following actions might be suitable for the events identified in the predictive section above.

  • When a customer moves, send a “welcome to the neighborhood” packet with coupons to nearby hardware stores.
  • Decline the fraudulent charge or deactivate the credit card.
  • If the new customer has very high expected lifetime value, provide some special treatment or offers to ensure the customer becomes a customer for life.
  • When a hurricane is approaching, place Pop tarts near the front of the store.

As you can see, good machine learning from the second phase can lead to clear actions.

Conclusion

Claiming success in Data Science is all about conquering all three stages. Each stage builds upon the previous stage. If you have put in the effort to complete the first stage, why not continue to the second and third stages?

NBA Basketball Analytics Hackathon

The National Basketball Association is hosting an Analytics Hackathon.
Application are accepted until August 6, 2017 and the actual competition occurs on September 23-24, 2017. The competition has 2 focus areas:

  • Basketball Analytics
  • Business Analytics

The prizes for winning consist of:

  • Lunch with NBA Commissioner
  • A trip to the All-Star Game
  • Tickets to a game of your choosing

To be eligible, you must be:

  • At least 18 years old
  • An undergraduate or graduate student

Good Luck!

Guidelines for Telling a Great Data Science Story

People love stories. People can connect with stories. People remember great stories. Make your data tell a story. If you can make stories come alive with data, people will pay attention.

There is no magic formula for a great story, data or otherwise. Here are some guidelines for telling a great data science story.

  • Clearly state the problem
  • Explain the data
  • Share the struggles of doing the analysis
  • Do not focus on the algorithms
  • Show how the analysis progressed, take your listeners on a journey
  • Finish with something remarkable

The late Hans Rosling could tell as good of a story with data as anyone. Do a quick internet search for his name, and you can easily find his Ted talks or other videos. He provides an excellent model for telling a story with data. It is worth your time to watch some of his videos.

The entire goal of telling a story with data is to get people engaged in the problem.

Leave a comment if you have others tips for telling an effective data science story.

Papers for Teaching Undergraduate Data Science

If you work at a university and are considering starting an undergraduate program in data science, then today’s post is for you.

If you know of any other papers, please leave a comment below.

Deep Learning Research Paper Lists for Summer 2017

The last links are not official academic papers, but they are quite good resources on deep learning.

Seeing Theory – A Visual Intro to Stats

Daniel Kunin from Brown University created a totally stunning and interactive site named Seeing Theory. It provides a visual introduction to many concepts in statistics and probability. Definitely worth checking out and sharing with others.

Tip: it does not work well on mobile.

5 Data Science Research Papers to read in Summer 2017

In the past, the blog has included 7 Important Data Science Papers and 5 More Data Science Papers. Here is another list if you are looking for something to read over the summer.

Site For Undergraduate Data Science Programs

Karl Schmitt, Director of Data Sciences at Valparaiso University, has started a blog to share his experiences with building an undergraduate data science program. The blog is titled, From the Director’s Desk. Karl is regularly posting about textbooks, curriculum, visualizations and learning objectives from the perspective of an educator. Tons of great resources!