Looking for datasets for your next project? You are in luck because Google just launched Dataset Search. The name is self-explanatory. Go try it out.
Google has recently released a Jupyter Notebook platform called Google Colaboratory. You can run Python code in a browser, share results, and save your code for later. It currently does not support R code.
Martin Zinkevich, Research Scientist at Google, just compiled a large list (43 to be exact) of best practices for building machine learning systems.
If you do data engineering or are involved with building data science systems, this document is worth a look.
Recently, a number of resources for publicly available datasets have been announced.
- Kaggle becomes the place for Open Data – I think this is big news! Kaggle just announced Kaggle Datasets which aims to be a repository for publicly available datasets. This is great for organizations that want to release data, but do not necessarily want the overhead of running an open data portal. Hopefully it will gain some traction and become an exceptional resource for open data.
- NASA Opens Research – NASA just announced all research papers funded by NASA will be publicly available. It appears the research articles will all be available at PubMed Central, and the data available at NASA’s Data Portal.
- Google Robotics Data – Google continues to do interesting things, and this topic is definitely that. It is a dataset about how robots grasp objects (Google Brain Robot Data). I am not overly familiar with this topic, so if you want to know more, see their blog post, Deep Learning for Robots.
For more options of open data, see Data Sources for Cool Data Science Projects Part 1 and Part 2.
Are you aware of any other resources that have been recently announced? If so, please leave a comment.
Microsoft has recently announced a machine learning competition platform. As part of the launch, one of the first competitions is the prediction of brain signals. It has $5000 in prizes, and submissions are accepted thru June 30, 2016.
Google and Tableau have teamed up to offer a big data visualization contest. The rules are fairly simple, just create an awesome visualization using at least the GDELT data set. Finalist will receive prizes worth over $5000 and even some will get tours of Tableau and Google facilities. The contest runs thru May 16, 2016.
I think this has been previously happening, but now Google has an official location for these public data sets stored in BigQuery. You can:
- Access and use the data in your applications
- Request Google to host your own public data set
It will be fun to watch this site expand with more public datasets. Happy Exploration!
Google recently announced the launch of their own Massive Open Online Course (MOOC). The course is titled, Making Sense of Data, and it begins tomorrow, March 18, 2014.
The prerequisites are quite simple. All that is needed is: a google account, a web browser, and a basic knowledge of spreadsheets.
The content of the course will focus on Fusion Tables, which is a new experimental product from Google. Fusion Tables is a web application for visualizing, gathering, and sharing data. I am not familiar with Fusion Tables, but the description sounds very useful.
Here is the promotional video.