Berkeley Undergrad Data Science Course and Textbook

The University of California at Berkeley has started The Berkeley Data Science Education Program. The goal is to build a data science education program throughout the next several years by engaging faculty and students from across the campus. The introductory data science course is targeting freshman and it is taught from a very applicable and interactive environment. The course videos, slides, labs, and notes are freely available for others to use. The course heavily uses Jupyter. Also, the course textbook is online at Computational and Inferential Thinking: The Foundations of Data Science.

What is Big Data? from datascience@berkeley

Jenna Dutcher, community relations manager for the datascience@berkeley online master’s program, interviewed more than 40 thought leaders to answer this one simple question: What is big data? (Full disclosure: I was honored to be asked to provided a definition on the list.)

The answers are quite diverse and definitely worth reading.

I thought Hal Varian, Chief Economist at Google, provided one of the simplest and best definitions.

Big data means data that cannot fit easily into a standard relational database.

See the full list of What is Big Data?

Which definition is your favorite? How would you define big data?

Data Size Matters Infographic

Here is a great infographic from Data Science @ Berkeley. Just how big is a Gigabyte(GB)? Be sure to look all the way to the bottom. It mentions/explains a few of the latest innovations in hard drives, for example: helium, SMR, HAMR. You will have to scroll to the bottom to see what those acronyms mean.

Brought to you by datascience@berkeley: Master of Information and Data Science

New Berkeley Online Data Science Degree

The University of California at Berkeley just announced a new masters degree in Information and Data Science (MIDS). The program is targeted to be completed entirely online with the exception of a one week visit to the campus. The program has a approximate cost of $60,000 for the 27 required credits. The curriculum looks good. It includes: machine learning, data analysis, visualization, big data processing, and privacy/ethics. The initial class of students will start in January of 2014.

UC Berkeley Course Lectures: Analyzing Big Data With Twitter | Analyzing Big Data with Twitter

The link includes videos and lecture notes from the course.

Free Open-Source Statistics Cookbook

Matthias Vallentin, a computer science PhD student at UC Berkeley, has published a Probability and Statistics Cookbook. The book can be freely downloaded in PDF format via the website. Also, the latex source is available on Github. Matthias states that others are free to fork the source and make changes.

The book is not a textbook. It is more of a cheatsheet. It contains many of the common probability and statistics techniques and the associated formula. I would consider this book to be an excellent resource to have around.

UC Berkeley Big Data Bootcamp

The University of California at Berkeley is hosting AMP Camp, the Big Data Bootcamp, starting today. The conference is sold out for in-person attendace, but registration is free and live streaming is available. The agenda looks good (including machine learning, parallel programming, Mesos, and hands-on exercises), so this might be a good opportunity for some learning.

Thanks to Mark Nickel for the link.