Papers for Teaching Undergraduate Data Science

If you work at a university and are considering starting an undergraduate program in data science, then today’s post is for you.

If you know of any other papers, please leave a comment below.

Site For Undergraduate Data Science Programs

Karl Schmitt, Director of Data Sciences at Valparaiso University, has started a blog to share his experiences with building an undergraduate data science program. The blog is titled, From the Director’s Desk. Karl is regularly posting about textbooks, curriculum, visualizations and learning objectives from the perspective of an educator. Tons of great resources!

Valparaiso University is Turning Homework into Social Change

Recently, I had the honor of speaking with Dr. Karl Schmitt from Valparaiso University. He is the director of the Data Science undergraduate program at Valparaiso University. We had a very nice discussion, and I thought I would pass along my summary.

What are the Details of the Valparaiso Undergraduate Program?

The program is housed in the Mathematics department and it is designed to be fairly interdisciplinary. It consists of four parts.

  1. Math
  2. Statistics
  3. Computer Science
  4. A Separate Focus Area

The separate focus area can be from nearly any other department and is targeted at building some domain expertise. Although not required, a double major is encouraged.

One of the most unique and excited aspects of the program begins during the first year. Students take Introduction to Data Science, which has few prerequisites and serves as motivation for the remainder of the program. Valparaiso partners with non-profits and government agencies to provide the first year students with hands-on experience solving problems for social good. Examples include Meals on Wheels, mapping with the United States Geological Survey, and a child welfare non-profit. Then, the junior and senior students are involved with a capstone project that can be a continuation of the first year project, some other social good project, or students can serve in a consulting capacity to other departments on campus.

What skills Do You expect Valparaiso Data Science Graduates to Have?

There are a few basics skills that make sense for data science: coding, database skills, statistics, and general math. In addition, Valparaiso grads should also know how to talk, write, and create videos about mathematical concepts. Finally, ethics is an essential portion of the program. According to Dr. Karl Schmitt,

I want my students to graduate with ethics related to data science.

To enforce that statement: ethics case studies are required of all students, it is a key learning objective of the projects, and ethics is integrated into all the classes so students understand the importance. Students need to be able to do the hard data science, communicate the results and care about the consequences.

Why Choose Data Science as an Undergraduate?

It is a utility degree that is in strong demand in nearly every field. As companies continue to understand the usage of data, having data skills is going to get increasingly more crucial. Data Scientist are going to be (currently are) in demand for human resources, supply, sales, technology and many other awesome jobs.

Why Valparaiso for Data Science?

There are a number of reasons:

  • Good University Size – It is easy to double major and engage with things outside the major, plus disciplines are very connected which allows for collaboration.
  • Writing/Communication is Integrated Throughout – Many people can crunch numbers, but Valparaiso graduates can express discoveries. The students get that from the very beginning.
  • Projects – All students will have experience and examples of projects to demonstrate.
  • Finally, students have an opportunity to turn their homework into something that matters!

Thank you to Dr. Karl Schmitt for the interview and to Valparaiso University for Sponsoring Data Science 101.

Building Data Science Skills as an Undergraduate

While there are a growing number of universities that offer undergraduate data science degrees, for one reason or another those programs may not be perfect for everyone interested in data science. So, what do you do if you attend a school that does not offer a data science degree? This is a question frequently asked of me, so I thought I would elaborate on my typical response.

You Cannot Know It All

First off, you will never know all there is to know about data science. The field is vast and contains many sub-fields. Thus, as an undergraduate, a good plan is to learn the fundamentals. Then expand your knowledge/expertise as your education and career continue. Data Science is evolving rapidly and it requires continual learning. Hopefully, this is one of the reasons you are interested in the field.

My Recommended Approach

A good plan is to major in computer science or statistics and minor in the other. If your school doesn’t have either of those major, then take as many of those classes as you can. Next, choose a domain specific area such as business, chemistry, psychology, etc.; and gear your elective classes toward that domain area. This approach will give you a solid base understanding of the statistical and computational underpinnings of data science. You should also be well-prepared to find a job or continue your studies in graduate school.

Also, somewhat related, taking an art class or two might not be a bad idea. Visualization is very important to data science. Understanding color palettes and usage of space on a canvas are concepts that will serve you well. Plus, many people strong in computer science and statistical algorithms are lacking in artistic skills.

Some Enhancements to Your Education

If your location allows, consider attending local meetups. Finally, get involved with whatever projects you can (Kaggle, internships, open source, ā€¦).

Do you have any advice for undergraduates looking to study data science? If so, please leave a comment.

Are you and undergraduate with questions? Please ask in the comments below.

Berkeley Undergrad Data Science Course and Textbook

The University of California at Berkeley has started The Berkeley Data Science Education Program. The goal is to build a data science education program throughout the next several years by engaging faculty and students from across the campus. The introductory data science course is targeting freshman and it is taught from a very applicable and interactive environment. The course videos, slides, labs, and notes are freely available for others to use. The course heavily uses Jupyter. Also, the course textbook is online at Computational and Inferential Thinking: The Foundations of Data Science.

Tips for Future Data Scientists

While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.

I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?

What follows is a summary of the many tips.

Tips for Data Science

  • Be flexible and adaptable – There is no single tool or technique that always works best.
  • Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
  • Not all building models – Like the previous tip, you must have skills beyond just model building.
  • Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
  • Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
  • Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
  • Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
  • Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
  • Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
  • Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
  • Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
  • Be polymath – It helps to be a person with a wide range of knowledge.

Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.

Midwest Undergraduate Data Analytics Competition

The 2016 Midwest Undergraduate Data Analytics Competition (MUDAC) will be held at Winona State University in Winona, Minnesota on April 2 and 3.

  • What is MUDAC?
    MUDAC is an intense 2-day analytics competition aimed at undergraduate students. Teams compete to solve a problem posed by an external organization.
  • Who can compete?
    Teams of 3 to 4 undergraduate students attending a school in Minnesota, Wisconsin, Iowa, Illinois, North Dakota, or South Dakota
  • Why attend MUDAC?
    • A fun learning experience
    • Friendly competition
    • Teamwork
    • Meet others with similar inteests
    • Learn about data science/analytic careers
    • Practice preparing and giving a presentation
    • Cash prizes for winning
    • Door prizes

The competition also includes a panel discussion with some local data professionals. I am honored to be one of those panelists.

If you attend or teach at a university in the upper Midwest and you are interested in data science, you should strongly consider bringing a team to MUDAC. I hope to see you there.

Undergraduate Programs in Data Science

While most of the degrees on the list of Colleges with Data Science Degrees are master’s degrees, there are a few schools offering data science as an undergraduate program.