Last week at Microsoft Ignite 2019, Microsoft Research announced the release of Project Silica. This is an amazing new technology to store data on a piece of glass. I was lucky enough to get the opportunity to sit down with Antony Rowstron, Deputy Lab Director at Microsoft Research, and ask him all my questions.
What is Project Silica?
Project Silica is a research endevour to store digital data on a piece of quartz glass. The technology currently exists and can be used to store files the size of the original Superman movie or the Windows 10 operating system.
The technology is still very new, so it will be years before it is productionalized.
How is Data Stored?
A femtosecond laser is fired inside the glass. It uses multiple pulses to form a voxel. A voxel can be thought of as a tiny iceberg within the glass. The voxel is shaped like a teardrop and has a size of approximately 1 micron (1 micrometer). Each voxel can store multiple bits and data depends upon the size and orientation of the voxel.
Many layers of voxels are stored on a piece of glass. The sample I got to see had 20 layers. The sample from the keynote had 74 layers. Current capabilities allow for hundreds of layers.
Why Quartz Glass?
It needs to be transparent because in order to read the data a microscope-like device needs to be able to focus at different layers. Thus, it needs to be able to see through the upper layers.
Quartz glass is purer than window pane glass. Plus, it is a readily available substance that the world already produces.
“The properties of the voxel formations is a function of glass.”
–Ant Rowstron, Microsoft Research
How is Data Read?
A separate device is used to read the data. It is a computer-controlled microsocope. To begin with, it focuses on the layer of interest and a set of polarization images are taken. These images are then processed to determine the orientation and size of the voxels. The process is then repeated for other layers.
The images are fused using machine learning and a Convolution Neural Network. In addition,
“There are about 8 tracks which have well-known data written in them. … If in 100 years time, it doesn’t read, we can we retrain the ML from the tracks we have. It is a self-describing media. “
–Ant Rowstron on how data is read from Project Silica
What are the Use Cases?
Project Silica is being created as a long-term archival storage device. Previous technologies such as tape and hard disks were designed before the cloud existed. They have limitations around temperature, humidity, air-quality, and life-span. Project Silica avoids those limitations.
“This is a technology designed just for the [cloud] datacenter.”
–Ant Rowstron, Microsoft Research
How Durable is it?
The Quartz glass will not deteriorate, which is one of its better qualities for this application. The glass can retain its stored data after being submerged in boiling water, put in a flame, scratched with steel wool, shaken or microwaved.
The voxels will still be there after 1000s of years.
It can however be smashed with a hammer or shattered like any other piece of glass.
What Do You Invision as the Future of Project Silica?
A wall covered in plates of glass with an arm coming out to fetch a piece of glass. Then the glass will be taken over to a reader. It will be read and then returned to the wall.
What is Next?
There are three things which are really important to any storage technology.
Density – how much data can be fit in a certain amount of space
Write throughput – how fast can the data be written
Read thoughput – how quickly can the data be read
Microsoft is going to be pushing really hard on all three. Currently a piece of glass the size of an optical disc can store more data than an optical disc. And, the read/write process has gotten 1000 times faster than the beginning of the project. Theoretically, the piece of glass I am holding in the image below should be able to hold hundreds of terabytes.
Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Here they are in my order of importance (based upon my opinion).
I think this announcement will have a very large and immediate impact. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed.
Azure Arc allows deployment and management of Azure services to any environment which can run Kubernetes. This allows Azure to manage a completely hybrid infrastructure of: Azure, on-premise, IoT, and other cloud environments. It is now possible to deploy an Azure SQL Database to a virtual machine running on Amazon Web Services (AWS) and manage it from Azure. It’s true, I saw it happen this week.
There were a few other interesting announcements which are not completely specific to data scientists, but are worth mentioning.
Visual Studio Online
This is exactly what it sounds like. An Integrated Development Environment (IDE) in your browser. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet. Years ago, there were many companies attempting to do this, but most are no longer around. Hopefully, now is a better time for an in-browser IDE and Visual Studio Online will succeed.
Without question, this was the coolest thing to be unveiled at Ignite. Microsoft Research has come up with a technique to store data into a piece of glass. They call it Project Silica. I was fortunate to be able to speak with one of the lead researchers on the project, so I will be sharing more from that interview later. It is fascinating, but probably years from implementation.
I have been ignoring Quantum for a while now, but it is time for that ignorance to stop. Microsoft is a (qu)bit late to the game but they are making some impressive progress. The Quantum Development Kit (QDK) has been released and Azure Quantum is in private preview, so you can sign up to be an early adopter.
Those are the big data science announcements of the week.
Online courses are a great way to share knowledge with others; that is why I have decided to launch a few courses. The first course is Intro to Azure ML Studio – Regression. This is a smaller course and should take about 2 hours to complete.
Azure ML Studio is a drag-and-drop interface for doing machine learning.
Topics are all based upon Azure ML Studio, and they include:
Microsoft Build 2019 – This is a huge conference hosted by Microsoft for the developer community. Many of the presentation are available to watch online. Not all are data science/AI related, but many are.
Google I/O 2019 Videos – Google’s big annual conference. Nearly all of the sessions are recorded. There are lots of AI talks and demos.
AWS DeepRacer – Learn reinforcement learning by programming an autonomous car and competing in races.