All posts by Ryan Swanstrom

Valparaiso University is Turning Homework into Social Change

Recently, I had the honor of speaking with Dr. Karl Schmitt from Valparaiso University. He is the director of the Data Science undergraduate program at Valparaiso University. We had a very nice discussion, and I thought I would pass along my summary.

What are the Details of the Valparaiso Undergraduate Program?

The program is housed in the Mathematics department and it is designed to be fairly interdisciplinary. It consists of four parts.

  1. Math
  2. Statistics
  3. Computer Science
  4. A Separate Focus Area

The separate focus area can be from nearly any other department and is targeted at building some domain expertise. Although not required, a double major is encouraged.

One of the most unique and excited aspects of the program begins during the first year. Students take Introduction to Data Science, which has few prerequisites and serves as motivation for the remainder of the program. Valparaiso partners with non-profits and government agencies to provide the first year students with hands-on experience solving problems for social good. Examples include Meals on Wheels, mapping with the United States Geological Survey, and a child welfare non-profit. Then, the junior and senior students are involved with a capstone project that can be a continuation of the first year project, some other social good project, or students can serve in a consulting capacity to other departments on campus.

What skills Do You expect Valparaiso Data Science Graduates to Have?

There are a few basics skills that make sense for data science: coding, database skills, statistics, and general math. In addition, Valparaiso grads should also know how to talk, write, and create videos about mathematical concepts. Finally, ethics is an essential portion of the program. According to Dr. Karl Schmitt,

I want my students to graduate with ethics related to data science.

To enforce that statement: ethics case studies are required of all students, it is a key learning objective of the projects, and ethics is integrated into all the classes so students understand the importance. Students need to be able to do the hard data science, communicate the results and care about the consequences.

Why Choose Data Science as an Undergraduate?

It is a utility degree that is in strong demand in nearly every field. As companies continue to understand the usage of data, having data skills is going to get increasingly more crucial. Data Scientist are going to be (currently are) in demand for human resources, supply, sales, technology and many other awesome jobs.

Why Valparaiso for Data Science?

There are a number of reasons:

  • Good University Size – It is easy to double major and engage with things outside the major, plus disciplines are very connected which allows for collaboration.
  • Writing/Communication is Integrated Throughout – Many people can crunch numbers, but Valparaiso graduates can express discoveries. The students get that from the very beginning.
  • Projects – All students will have experience and examples of projects to demonstrate.
  • Finally, students have an opportunity to turn their homework into something that matters!


Thank you to Dr. Karl Schmitt for the interview and to Valparaiso University for Sponsoring Data Science 101.

It is Open Data Day!

March 4, 2017 is Open Data Day.

Open Data Day is an annual celebration across the globe. Over 300 groups around the world schedule activities to use open data for their communities. See if there is a gathering in your area. Also, the focus this year is on:

  • Open research data
  • Tracking public money flows
  • Open data for environment
  • Open data for human rights

Good Luck!

Netflix Data Scientist on Machine Learning: Free Webinar

The Data Incubator, a data science fellowship program, is currently running a Data Science in 30 minutes webinar series. Next week features a free webinar with Dr. Becky Tucker of Netflix. Dr. Tucker is a Senior Data Scientist at Netflix where she specializes in predictive modeling for content demand (think what do people want to watch). The full abstract of the webinar is below. The webinar is free; all you need to do is register.

Predicting Content Demand with Machine Learning

Date/Time: March 9, 2017 @ 5:30 PM ET
Location: Online
Register: Click Here

Abstract: Netflix is well-known for its data-driven recommendations that seek to customize the user experience for every subscriber. But data science at Netflix extends far beyond that – from optimizing streaming and content caching to informing decisions about the TV shows and films available on the service. The talk will cover work done by Becky and the Content Data Science team at Netflix, which seeks to evaluate where Netflix should spend their next content dollar using machine learning and predictive models.

Update – Below is the Recorded Webinar

Building Data Science Skills as an Undergraduate

While there are a growing number of universities that offer undergraduate data science degrees, for one reason or another those programs may not be perfect for everyone interested in data science. So, what do you do if you attend a school that does not offer a data science degree? This is a question frequently asked of me, so I thought I would elaborate on my typical response.

You Cannot Know It All

First off, you will never know all there is to know about data science. The field is vast and contains many sub-fields. Thus, as an undergraduate, a good plan is to learn the fundamentals. Then expand your knowledge/expertise as your education and career continue. Data Science is evolving rapidly and it requires continual learning. Hopefully, this is one of the reasons you are interested in the field.

My Recommended Approach

A good plan is to major in computer science or statistics and minor in the other. If your school doesn’t have either of those major, then take as many of those classes as you can. Next, choose a domain specific area such as business, chemistry, psychology, etc.; and gear your elective classes toward that domain area. This approach will give you a solid base understanding of the statistical and computational underpinnings of data science. You should also be well-prepared to find a job or continue your studies in graduate school.

Also, somewhat related, taking an art class or two might not be a bad idea. Visualization is very important to data science. Understanding color palettes and usage of space on a canvas are concepts that will serve you well. Plus, many people strong in computer science and statistical algorithms are lacking in artistic skills.

Some Enhancements to Your Education

If your location allows, consider attending local meetups. Finally, get involved with whatever projects you can (Kaggle, internships, open source, …).

Do you have any advice for undergraduates looking to study data science? If so, please leave a comment.

Are you and undergraduate with questions? Please ask in the comments below.

Quora Answers by Monica Rogati

Monica Rogati, a legend in the data science space, recently provided some answers on Quora that are sheer internet gold.

Quora Answers by Monica Rogati

She answers questions involving:

  • What is a data science advisor?
  • Challenges of Building a data science team?
  • Characteristics of a good data scientist?
  • and more

They are filled with great advice.

Get me Sum ‘dat Big Data

I teach data science courses thoughout the US. I enjoying asking attendees why they are in class. I get many good answers, but occassionally I get some funny answers. Here is a story with one of the more humorous answers.

While chatting with an attendee before class, I asked why he chose to attend this class. Here was his answer.

Well, my boss attended a conference and heard a talk on Big Data. Then, he came back to the office and bought hadoop for some of our systems. Next he heard about this training and told me to attend. When preparing to leave, the boss said, “Get me sum ‘dat big data”.

After a slight chuckle from both of us, I mentioned we would talk more about that in class.

While this story is somewhat humorous, it is not all that uncommon. Companies want to start using data science, they often just do not know where to start. If you are looking for a starting point, check out this post, You Want Data Science, Now What?.

Do you have a funny “data science” or “big data” story? If so, please share in the comments.

Best Practices for Machine Learning Engineering

Martin Zinkevich, Research Scientist at Google, just compiled a large list (43 to be exact) of best practices for building machine learning systems.

Rules of Machine Learning:
Best Practices for ML Engineering

If you do data engineering or are involved with building data science systems, this document is worth a look.

You Want Data Science, Now What?

I am often confronted by people or organizations whom have heard about data science but don’t know where to start. It is a valid concern. Data science is a broad topic with different meanings to different people.

Here are the common questions I hear. Should I hire a data scientist? Should I hire some consultants? Should I build a data science team? There is no perfect answer for those questions because it depends upon your organization and situation. I would like to suggest a different approach. At first, don’t worry about the titles and organizational structure. Worry about the problems you want to solve. First, start out with 2 questions.

1. What is the goal (be specific)?

This question might seem obvious, but it is often overlooked. Don’t start with data science just because you have heard about others using it. A bad goal for data science is: be data-driven to increase profits. While that might be a high-level strategy, it is much too broad. Better goals are:

  • identify which customers are likely to leave
  • identify which products a customer might buy next
  • determine what cities would be best for expansion
  • find the most profitable type of marketing for your organization
  • predict if a person will get cancer in the next year

These are examples of specific goals that data science can help to address. Work hard to narrow your goals to something specific. If you can get enough specific goals, then you might be able to increase profits.

2. What action can be taken?

This is very important. All the predictions and fancy data science does you no good if your organization cannot take any action. For example, sticking with the previous examples. Suppose you can predict if a person will get cancer in the next year. What do you do with that information? Do you send the person an email? What if you are wrong? Do people really want to know that? That is a tricky situation to handle and any action you take has an ethics component.

Other situations have simpler actions, such as identifying the products a customer might be next. Common actions might be: sending a coupon, displaying an add, or suggesting the item be added to the cart.

Another factor to consider with the action is cost. How much will it cost to perform some action. In certain businesses, it might be more profitable to attract new customers than retain existing customers. Thus, there is little advantage to identifying which customers are likely leave.

Conclusion

Data science is very exciting, and it has many positives. However, when done with incorrect expectations, it can lead to nowhere but headaches. Thus, before you start building a team or hiring some consultants, make sure you are clear on your goals and actions.

TensorKart and Neural Networks for MarioKart

Kevin Hughes used TensorFlow to train a Neural Network to play MarioKart. He calls it TensorKart. See his post for more details. It is a nice blog post and sounds like a fun project.
Sorry, the video has no sound.

Julia for Data Science Book

Today brings us a very welcome guest post by Zacharias Voulgaris, author of Julia for Data Science. This is an excellent new book about the Julia language. By reading it you will learn about:

  • IDEs for using Julia
  • Basics of the Julia language
  • Accessing and exploring data
  • Machine learning
  • Advanced data science techniques with Julia (cross-validation, clustering, PCA, and more)

The book has a nice flow for someone starting out with Julia and the topics are well explained. Enjoy the post, and hopefully you get a chance to check out the book.

Introducing Julia for Data Science (Technics Publications), a Great Resource for Anyone Interested in Data Science.

Over the past couple of years, there have been several books on the Julia language, a relatively new and versatile tool for computationally-heavy applications. Julia has been adopted extensively by the scientific community as it provided a great alternative to MATLAB and R, while its high-level programming style made it easy for people who were not adept programmers. Also, lately it has attracted the attention of computer science professionals (including Python programmers) as well as data scientists. These people who were already very effective coders, decided to learn this language as well, since it provided undeniable benefits in terms of performance and rapid prototype development, esp. when it came to numeric applications. In addition, the fact that Julia was and is still being developed by a few top MIT graduates goes on to show that this is not a novelty doomed to fade away soon, but instead it is a serious effort that’s bound to linger for many years to come.

However, this post is not about Julia per se, since there are many other people who have made its many merits known to the world since the language was first released in 2012. Instead, we aim to talk about the lesser-known aspects of the language, namely its abundant applications in the fascinating field of data science. Although there are already some reliable resources out there pinpointing the fact that Julia is undoubtedly ready for data science, this book is the first and most complete resource on this topic. Without assuming any prior knowledge of the language, it guides you step-by-step to the mastery of the Julia essentials, helping you get comfortable enough to use it for a variety data science applications. It may not make you an expert in the language, but data scientists rarely care about the esoteric aspects of the programming tools they use, since this level of know-how is not required for getting stuff done. However, the reader is given enough information to be able to investigate those aspects on his own.

The Julia for Data Science book has been in development for about a year and is heavily focused on the applications part, with lots of code snippets, examples, and even questions and exercises, in every chapter. Also, it makes use of a couple of datasets that closely resemble the real-world ones that data scientists encounter in their everyday work. On top of that, it provides you with some theory on the data science process (there is a whole chapter of it dedicated to this, although other books usually devote a couple of pages to it). Although the book is not a complete guide to data science, it provides you with enough information to have a sense of perspective and understand how everything fits together. It is by no means a recipe book, though you can use it as reference one, once you have finished reading it.

The Julia for Data Science book is available at the publisher’s website, as well as on Amazon, in both paperback and eBook formats. We encourage you to give it a read and experience first-hand how Julia can enrich your data science toolbox!