Julia for Data Science Book

Today brings us a very welcome guest post by Zacharias Voulgaris, author of Julia for Data Science. This is an excellent new book about the Julia language. By reading it you will learn about:

  • IDEs for using Julia
  • Basics of the Julia language
  • Accessing and exploring data
  • Machine learning
  • Advanced data science techniques with Julia (cross-validation, clustering, PCA, and more)

The book has a nice flow for someone starting out with Julia and the topics are well explained. Enjoy the post, and hopefully you get a chance to check out the book.

Introducing Julia for Data Science (Technics Publications), a Great Resource for Anyone Interested in Data Science.

Over the past couple of years, there have been several books on the Julia language, a relatively new and versatile tool for computationally-heavy applications. Julia has been adopted extensively by the scientific community as it provided a great alternative to MATLAB and R, while its high-level programming style made it easy for people who were not adept programmers. Also, lately it has attracted the attention of computer science professionals (including Python programmers) as well as data scientists. These people who were already very effective coders, decided to learn this language as well, since it provided undeniable benefits in terms of performance and rapid prototype development, esp. when it came to numeric applications. In addition, the fact that Julia was and is still being developed by a few top MIT graduates goes on to show that this is not a novelty doomed to fade away soon, but instead it is a serious effort that’s bound to linger for many years to come.

However, this post is not about Julia per se, since there are many other people who have made its many merits known to the world since the language was first released in 2012. Instead, we aim to talk about the lesser-known aspects of the language, namely its abundant applications in the fascinating field of data science. Although there are already some reliable resources out there pinpointing the fact that Julia is undoubtedly ready for data science, this book is the first and most complete resource on this topic. Without assuming any prior knowledge of the language, it guides you step-by-step to the mastery of the Julia essentials, helping you get comfortable enough to use it for a variety data science applications. It may not make you an expert in the language, but data scientists rarely care about the esoteric aspects of the programming tools they use, since this level of know-how is not required for getting stuff done. However, the reader is given enough information to be able to investigate those aspects on his own.

The Julia for Data Science book has been in development for about a year and is heavily focused on the applications part, with lots of code snippets, examples, and even questions and exercises, in every chapter. Also, it makes use of a couple of datasets that closely resemble the real-world ones that data scientists encounter in their everyday work. On top of that, it provides you with some theory on the data science process (there is a whole chapter of it dedicated to this, although other books usually devote a couple of pages to it). Although the book is not a complete guide to data science, it provides you with enough information to have a sense of perspective and understand how everything fits together. It is by no means a recipe book, though you can use it as reference one, once you have finished reading it.

The Julia for Data Science book is available at the publisher’s website, as well as on Amazon, in both paperback and eBook formats. We encourage you to give it a read and experience first-hand how Julia can enrich your data science toolbox!

Recent Free Online Books for Data Science

This is just a short list of a few books that I have have recently discovered online.

  • Model-Based Machine Learning – Chapters of this book become available as they are being written. It introduces machine learning via case studies instead of just focusing on the algorithms.
  • Foundations of Data Science – This is a much more academic-focused book which could be used at the undergraduate or graduate level. It covers many of the topics one would expect: machine learning, streaming, clustering and more.
  • Deep Learning Book – This book was previously available only in HTML form and not complete. Now, it is free and downloadable.

Machine Learning Yearning Book

Andrew Ng [Co-Founder of Coursera, Stanford Professor, Chief Scientist at Baidu, and All-Around Machine Learning Expert] is writing a book during the summer of 2016. The book is titled, Machine Learning Yearning. It you visit the site and signup quickly you can get draft copies of the chapters as they become available.

Andrew is an excellent teacher. His MOOCs are wildly successful, and I expect his book to be excellent as well.

Free Stats book for Computer Scientists

Professor Norm Matloff from the University of California, Davis has published From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science which is an open textbook. It approaches statistics from a computer science perspective. Dr. Matloff has been both a professor of statistics and computer science so he is well suited to write such a textbook. This would a good choice of a textbook for a statistics course targeted at primarily computer scientists. It uses the R programming language. The book starts by building the foundations of probability before entering statistics.

Understanding Machine Learning: From Theory to Algorithms (Free Book Download)

Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Associate Professor at the School of Computer
Science and Engineering at The Hebrew University, Israel, and
Shai Ben-David, Professor in the School of Computer Science at the
University of Waterloo, Canada. The book looks very thorough. Below is just a sampling of the topics covered.

  • Bias-Complexity Tradeoff
  • Model Selection
  • Support Vector Machines
  • Decision Trees
  • Neural Networks
  • Clustering
  • Dimensionality Reduction
  • Feature Selection and Generation
  • Advanced Theory
  • And LOTS LOTS more….

Happy Learning!

Free Data Science Book for Ordinary People

A great read for people without an extensive math, statistics or computer science background. And still an interesting read for those people.

The book includes tons of non-technical descriptions for data science terms.

You can download a copy of the book on SlideShare, or you can purchase a paperback copy via Lulu.

Model-based Machine Learning, Free Early Access Book

Free Book, Mining Massive Datasets, 2nd Edition

A new edition of Mining Massive Datasets is now available. It is used for a number of data mining courses at colleges across the US (and globe). Here are just a few of the topics from the book.

  • Map-reduce
  • Clustering
  • Recommendation Systems
  • Dimensionality Reduction
  • Social Network Analysis

Free Deep Learning Book

Yoshua Bengio, Ian Goodfellow and Aaron Courville are writing a deep learning book for MIT Press. The book is not yet complete, but the drafts of the chapters are all available online. The authors are also collecting comments about the chapters before the book goes to press.

The book is broken into 3 sections:

  1. Math and Machine Learning Fundamentals
  2. Modern Deep Neural Networks
  3. Current Research in Deep Learning

The book is very technical and probably suitable for a graduate level course. However, if you have the time and interest, resources such as this are highly valuable.

Women In Data – Free ebook

O’Reilly just published a free ebook profiling 15 influential women in data science, Women in Data. The book is written by Cornelia Levy-Bencheton.

The following women are profiled in the book:

  • Michele Chambers, COO of RapidMiner
  • Camille Fournier, CTO of Rent the Runway
  • Carla Gentry, CEO of Analytical Solution
  • Kelly Hoey, Speaker and Early-stage Investor
  • Cindi Howson, VP of Research at Gartner
  • Neha Narkhede, Co-founder of Confluent
  • Claudia Perlich, Chief Scientist at Dstillery
  • Kira Radinsky, Co-founder of SalesPredict
  • Gwen Shapira, Software Engineer at Cloudera
  • Laurie Skelly, Data Scientist at Datascope
  • Kathleen Ting, Technical Account Manager at Cloudera
  • Renetta Garrison Tull, Associate Vice Provost of UMBC
  • Hanna Wallach, Researcher at Microsoft
  • Alice Zheng, Director of Data Science at Dato
  • Margit Zwemer, Founder of LiquidLandscape