Tag Archives: microsoft

Cloud Data Science News Beta #2

Here are this weeks major announcements and news for doing data science in the cloud.

Microsoft Azure

Amazon AWS

Google Cloud

If you would like to get the Cloud Data Science News as an email, you can sign up for the Cloud Data Science Newsletter.

Storing Data On a Piece of Glass – Microsoft’s Project Silica

Last week at Microsoft Ignite 2019, Microsoft Research announced the release of Project Silica. This is an amazing new technology to store data on a piece of glass. I was lucky enough to get the opportunity to sit down with Antony Rowstron, Deputy Lab Director at Microsoft Research, and ask him all my questions.

project-silica-interview
An interview with Project Silica Researcher, Ant Rowstron

What is Project Silica?

Project Silica is a research endevour to store digital data on a piece of quartz glass. The technology currently exists and can be used to store files the size of the original Superman movie or the Windows 10 operating system.

The technology is still very new, so it will be years before it is productionalized.

How is Data Stored?

A femtosecond laser is fired inside the glass. It uses multiple pulses to form a voxel. A voxel can be thought of as a tiny iceberg within the glass. The voxel is shaped like a teardrop and has a size of approximately 1 micron (1 micrometer). Each voxel can store multiple bits and data depends upon the size and orientation of the voxel.

Many layers of voxels are stored on a piece of glass. The sample I got to see had 20 layers. The sample from the keynote had 74 layers. Current capabilities allow for hundreds of layers.

Why Quartz Glass?

It needs to be transparent because in order to read the data a microscope-like device needs to be able to focus at different layers. Thus, it needs to be able to see through the upper layers.

Quartz glass is purer than window pane glass. Plus, it is a readily available substance that the world already produces.

Most importantly,

“The properties of the voxel formations is a function of glass.”

–Ant Rowstron, Microsoft Research

How is Data Read?

A separate device is used to read the data. It is a computer-controlled microsocope. To begin with, it focuses on the layer of interest and a set of polarization images are taken. These images are then processed to determine the orientation and size of the voxels. The process is then repeated for other layers.

The images are fused using machine learning and a Convolution Neural Network. In addition,

“There are about 8 tracks which have well-known data written in them. … If in 100 years time, it doesn’t read, we can we retrain the ML from the tracks we have. It is a self-describing media. “

–Ant Rowstron on how data is read from Project Silica

What are the Use Cases?

Project Silica is being created as a long-term archival storage device. Previous technologies such as tape and hard disks were designed before the cloud existed. They have limitations around temperature, humidity, air-quality, and life-span. Project Silica avoids those limitations.

“This is a technology designed just for the [cloud] datacenter.”

–Ant Rowstron, Microsoft Research

How Durable is it?

The Quartz glass will not deteriorate, which is one of its better qualities for this application. The glass can retain its stored data after being submerged in boiling water, put in a flame, scratched with steel wool, shaken or microwaved.

The voxels will still be there after 1000s of years.

It can however be smashed with a hammer or shattered like any other piece of glass.

What Do You Invision as the Future of Project Silica?

A wall covered in plates of glass with an arm coming out to fetch a piece of glass. Then the glass will be taken over to a reader. It will be read and then returned to the wall.

What is Next?

There are three things which are really important to any storage technology.

  1. Density – how much data can be fit in a certain amount of space
  2. Write throughput – how fast can the data be written
  3. Read thoughput – how quickly can the data be read

Microsoft is going to be pushing really hard on all three. Currently a piece of glass the size of an optical disc can store more data than an optical disc. And, the read/write process has gotten 1000 times faster than the beginning of the project. Theoretically, the piece of glass I am holding in the image below should be able to hold hundreds of terabytes.

holding-project-silica
Holding Project Silica with Ant Rowstron

More Information

If you are looking for more information, you can reference one of the research papers, Glass: A New Media for a New Era or you can listen to the Microsoft Research Podcast Episode, Optics for the Cloud .

Microsoft has also published a short video demonstrating some of the capabilities.

Data Science News from Microsoft Ignite 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Here they are in my order of importance (based upon my opinion).

Azure Synapse

I think this announcement will have a very large and immediate impact. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed.

Azure Arc

Azure Arc allows deployment and management of Azure services to any environment which can run Kubernetes. This allows Azure to manage a completely hybrid infrastructure of: Azure, on-premise, IoT, and other cloud environments. It is now possible to deploy an Azure SQL Database to a virtual machine running on Amazon Web Services (AWS) and manage it from Azure. It’s true, I saw it happen this week.

R Support for Azure Machine Learning

Azure Machine Learning now has a new web interface and it just got support for the R programming language. Python support has been available for a while. Azure Machine Learning is an environment to help with all the aspects of data science from data cleaning to model training to deployment.

Others

There were a few other interesting announcements which are not completely specific to data scientists, but are worth mentioning.

Visual Studio Online

This is exactly what it sounds like. An Integrated Development Environment (IDE) in your browser. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet. Years ago, there were many companies attempting to do this, but most are no longer around. Hopefully, now is a better time for an in-browser IDE and Visual Studio Online will succeed.

Project Silica

Without question, this was the coolest thing to be unveiled at Ignite. Microsoft Research has come up with a technique to store data into a piece of glass. They call it Project Silica. I was fortunate to be able to speak with one of the lead researchers on the project, so I will be sharing more from that interview later. It is fascinating, but probably years from implementation.

Azure Quantum

I have been ignoring Quantum for a while now, but it is time for that ignorance to stop. Microsoft is a (qu)bit late to the game but they are making some impressive progress. The Quantum Development Kit (QDK) has been released and Azure Quantum is in private preview, so you can sign up to be an early adopter.

Those are the big data science announcements of the week.

Open Source Data Science Projects 2019

A number of new impactful open source projects have been released lately.

Open Source Data Science Projects

  • Pythia – from Facebook for deep learning with vision and language, “such as answering questions related to visual data and automatically generating image captions “
  • InterpretML – from Microsoft, ” package for training interpretable models and explaining blackbox systems “
  • ML framework for Julia – from Alan Turing Institute, MLJ is a machine learning toolbox for Julia
  • Plato – a conversational AI platform from Uber

Is the list missing a project released in 2019? If so, please leave a comment.

Course Launch – Intro to Azure ML Studio with Regression

Online courses are a great way to share knowledge with others; that is why I have decided to launch a few courses. The first course is Intro to Azure ML Studio – Regression. This is a smaller course and should take about 2 hours to complete.

Azure ML Studio is a drag-and-drop interface for doing machine learning.

Topics are all based upon Azure ML Studio, and they include:

  • Linear Regression
  • Linear Correlation
  • Feature Selection
  • Splitting Data
  • Evaluating a model

Use the code BLOGREADER to save 30%.

My Experience Taking Microsoft DP-100: Designing and Implementing a Data Science Solution on Azure

I took and passed DP-100 during the beta period. I recorded a live video talking about my experience. Below is that section of the live video. Also, here are the main topics:

  • Azure ML Studio
  • Machine Learning
  • Python
  • High-level knowledge of Azure Products

Also, if you want a checklist to prepare for the exam, I have created one, it is free.

Data Science News for May 2019

Here is the latest data science news for May 2019.

From Data Science 101

General Data Science