Somewhat lost in the hype of Google’s Cloud Machine Learning announcement (which is itself neat), was the release of Google’s Public Data Sets.
I think this has been previously happening, but now Google has an official location for these public data sets stored in BigQuery. You can:
- Access and use the data in your applications
- Request Google to host your own public data set
It will be fun to watch this site expand with more public datasets. Happy Exploration!
Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Associate Professor at the School of Computer
Science and Engineering at The Hebrew University, Israel, and
Shai Ben-David, Professor in the School of Computer Science at the
University of Waterloo, Canada. The book looks very thorough. Below is just a sampling of the topics covered.
- Bias-Complexity Tradeoff
- Model Selection
- Support Vector Machines
- Decision Trees
- Neural Networks
- Dimensionality Reduction
- Feature Selection and Generation
- Advanced Theory
- And LOTS LOTS more….
O’Reilly just published a free ebook profiling 15 influential women in data science, Women in Data. The book is written by Cornelia Levy-Bencheton.
The following women are profiled in the book:
- Michele Chambers, COO of RapidMiner
- Camille Fournier, CTO of Rent the Runway
- Carla Gentry, CEO of Analytical Solution
- Kelly Hoey, Speaker and Early-stage Investor
- Cindi Howson, VP of Research at Gartner
- Neha Narkhede, Co-founder of Confluent
- Claudia Perlich, Chief Scientist at Dstillery
- Kira Radinsky, Co-founder of SalesPredict
- Gwen Shapira, Software Engineer at Cloudera
- Laurie Skelly, Data Scientist at Datascope
- Kathleen Ting, Technical Account Manager at Cloudera
- Renetta Garrison Tull, Associate Vice Provost of UMBC
- Hanna Wallach, Researcher at Microsoft
- Alice Zheng, Director of Data Science at Dato
- Margit Zwemer, Founder of LiquidLandscape
DataQuest is a recently launched online data science learning platform for python. The site consists of a gamified series of missions that increase in difficulty as your skills progress. Here are a few other features of the site.
- Sample Code
- Live, Interactive Browser-based Coding Environment
- Step by Step Instructions
- Instant Feedback
- Helpful Forums for Q&A
The site is still under development and the founder, Vik Paruchuri, is looking for help developing more content and missions for the site. If that is something of interest to you, get in touch with Vik via the DataQuest website.
The creators is the Insight Data Science Fellows Program have done it again. This time they have created the Insight Data Engineering Program. The program aims to training highly specialized software engineers that can build big data systems and big data pipelines. Unlike the data science program, the data engineering program does not target people with PhDs. Please visit the Insight Data Engineering website for a white paper with all the details on the program.
Here is an official announcement:
The Insight Data Engineering Fellows Program is a professional training fellowship designed to help engineers from various backgrounds, as well as mathematicians, and computer scientists, transition to careers in data engineering. – Tuition free, 6 week, full-time, data engineering training fellowship in Silicon Valley this summer. – Alumni network of 70 Insight Fellows who are now data scientists and data engineers at Facebook, LinkedIn, Microsoft, Twitter, Square, Netflix, Airbnb, Palantir, Jawbone and many others. – Interview at top technology companies hiring data engineers at the end of the fellowship. For more information please visit:
Markets for Good, an organization focused on performing data science for the social sector, recently released an ebook highlighting their 17 most influential blog posts. The ebook is titled, Markets for Good Selected Readings: Making Sense of Data and Information in the Social Sector.
Here is just a small sampling of the topics you can read about:
- 3 Reasons Why Open Data Will Change the World
- Let Our Data Define Us
- Put Your Data Where Your Mouth Is
If you are interested in how data can be used to help the world, this ebook is a good place to start.
Hal Daumé III, Assistant Professor of Computer Science at the University of Maryland, has placed the contents of his book online. The book is titled A Course in Machine Learning.
Here is a small sampling of the chapters from A Course in Machine Learning:
- Decision Trees
- Linear Models
- Neural Networks
- Ensemble Methods
- Bayesian Learning
Mohammed J. Zaki, Computer Science Professor at RPI, and Wagner Meira Jr., Computer Science Professor at Universidade Federal de Minas Gerais, have written the textbook Data Mining and Analysis: Fundamental Concepts and Algorithms. The book is currently available as a PDF download.
Based upon the chapters, the book looks very good. It contains large sections on data analysis, clustering, and classification. The final book will be published sometime in 2014.
Alteryx is offering the book, Big Data Analytics For Dummies, for free. If you are new to the term big data, this book provides a brief (about 40 pages) overview of the topic and what big data should be able to do for your company.
You have to register, but it is worth it for the free book.