Deep Learning is the hottest topic in all of data science right now. Adam Gibson, cofounder of Blix.io, has created an open source deep learning library for Java named DeepLearning4j. For those curious, DeepLearning4j is open sourced on github.
Below is a video of Adam introducing deep learning and DeepLearning4j. Also, if you are interested in learning more about deep learning. Here are a couple more very help links.
It was a 2-week intensive course focused on machine learning for big data. Some of the top academics in machine learning gave presentations. Most of the videos are fairly long (around 1 hour each), but a whole lot of material is covered.
All the CMU Machine Learning Summer School Videos are on Youtube.
Here is one lecture by Alex Smola on Scalable Machine Learning.
Mode Analytics, a recently launched site for collaborative data science in the cloud, has published an excellent tutorial for learning SQL.
The tutorial is named SQL School .
This is one of the best SQL tutorials I have seen. Plus, it has the huge added advantage of not requiring you to setup your own database first (the data is already available). Setting up your own database can be a bit overwhelming when you are first learning. So, if you are looking to learn SQL, now is a great time to start.
These slides are targeted at Kaggle competitions, but the R packages can be helpful to anyone using R for data analysis. The slides were created by Xavier Conort, a winner of multiple competitions.
Stanford University has just released a collection of large datasets of network data. When I say network data, I am referring to the mathematical term of networks (think of a collection of nodes and edges). Here are just a few of the possible categories.
- Citation Networks
- Road Networks
- Web graphs
- Social Networks such as twitter
- and many more
If you are looking to study network data, or just want some practice analyzing big data, this just might be a good place to start.
Health Data Consortium is an advocacy group focused on helping the healthcare industry respond to the availability of health data. They are currently focused on innovation and the uses of open health data.
Healthcare is currently undergoing some radical changes and data science is going to play a key role in the future of healthcare. It is great to see the medical field building an official group to define the practice. I hope other industry will follow the lead of the medical field and begin forming their own groups around open data. I am eager to see how the Health Data Consortium progresses over the coming years and months.
The team that brought you the Analytics Handbook, has freely published the third and final book, titled THE DATA ANALYTICS HANDBOOK RESEARCHERS + ACADEMICS. This book focuses on data science in research and academics communities. Like the previous 2 books in the series, it includes interviews with top experts in the field. Here are just a few of the people with interviews in this book.
The authors are now working on a new data science training project called Leada. Check it out for more details.