Cloud Data Science News Beta #1

Welcome to the first beta edition of Cloud Data Science News. This will cover major announcements and news for doing data science in the cloud. If the first few are well received, this will become a weekly segment.

Microsoft Azure

  • Azure Arc
    You can now run Azure services anywhere (on-prem, on the edge, any cloud) you can run Kubernetes.
  • Azure Synapse Analytics
    This is the future of data warehousing. It combines data warehousing and data lakes into a simple query interface for a simple and fast analytics service.
  • SQL Server 2019
    SQL Server 2019 went Generally Available.
  • Data Science Announcements from Microsoft Ignite
    Many other services were announced such as: Azure Quantum, Project Silica, R support in Azure ML, and Visual Studio Online.

Amazon Web Services

  • Call for Research Proposals
    Amazon is seeking proposals for impactful research in the Artificial Intelligence and Machine Learning areas. If you are at a University or non-profit, you can ask for cash and/or AWS credits.
  • AWS Parallel Cluster for Machine Learning
    AWS Parallel Cluster is an open-source cluster management tool. It can be used to do distributed Machine Learning on AWS.

Google Cloud

If you would like to get the Cloud Data Science News as an email, you can sign up for the Cloud Data Science Newsletter.

Data Science News from Microsoft Ignite 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Here they are in my order of importance (based upon my opinion).

Azure Synapse

I think this announcement will have a very large and immediate impact. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed.

Azure Arc

Azure Arc allows deployment and management of Azure services to any environment which can run Kubernetes. This allows Azure to manage a completely hybrid infrastructure of: Azure, on-premise, IoT, and other cloud environments. It is now possible to deploy an Azure SQL Database to a virtual machine running on Amazon Web Services (AWS) and manage it from Azure. It’s true, I saw it happen this week.

R Support for Azure Machine Learning

Azure Machine Learning now has a new web interface and it just got support for the R programming language. Python support has been available for a while. Azure Machine Learning is an environment to help with all the aspects of data science from data cleaning to model training to deployment.

Others

There were a few other interesting announcements which are not completely specific to data scientists, but are worth mentioning.

Visual Studio Online

This is exactly what it sounds like. An Integrated Development Environment (IDE) in your browser. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet. Years ago, there were many companies attempting to do this, but most are no longer around. Hopefully, now is a better time for an in-browser IDE and Visual Studio Online will succeed.

Project Silica

Without question, this was the coolest thing to be unveiled at Ignite. Microsoft Research has come up with a technique to store data into a piece of glass. They call it Project Silica. I was fortunate to be able to speak with one of the lead researchers on the project, so I will be sharing more from that interview later. It is fascinating, but probably years from implementation.

Azure Quantum

I have been ignoring Quantum for a while now, but it is time for that ignorance to stop. Microsoft is a (qu)bit late to the game but they are making some impressive progress. The Quantum Development Kit (QDK) has been released and Azure Quantum is in private preview, so you can sign up to be an early adopter.

Those are the big data science announcements of the week.