Tag Archives: data science

Cloud Data Science News Beta #1

Welcome to the first beta edition of Cloud Data Science News. This will cover major announcements and news for doing data science in the cloud. If the first few are well received, this will become a weekly segment.

Microsoft Azure

  • Azure Arc
    You can now run Azure services anywhere (on-prem, on the edge, any cloud) you can run Kubernetes.
  • Azure Synapse Analytics
    This is the future of data warehousing. It combines data warehousing and data lakes into a simple query interface for a simple and fast analytics service.
  • SQL Server 2019
    SQL Server 2019 went Generally Available.
  • Data Science Announcements from Microsoft Ignite
    Many other services were announced such as: Azure Quantum, Project Silica, R support in Azure ML, and Visual Studio Online.

Amazon Web Services

  • Call for Research Proposals
    Amazon is seeking proposals for impactful research in the Artificial Intelligence and Machine Learning areas. If you are at a University or non-profit, you can ask for cash and/or AWS credits.
  • AWS Parallel Cluster for Machine Learning
    AWS Parallel Cluster is an open-source cluster management tool. It can be used to do distributed Machine Learning on AWS.

Google Cloud

If you would like to get the Cloud Data Science News as an email, you can sign up for the Cloud Data Science Newsletter.

Data Science News from Microsoft Ignite 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Here they are in my order of importance (based upon my opinion).

Azure Synapse

I think this announcement will have a very large and immediate impact. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed.

Azure Arc

Azure Arc allows deployment and management of Azure services to any environment which can run Kubernetes. This allows Azure to manage a completely hybrid infrastructure of: Azure, on-premise, IoT, and other cloud environments. It is now possible to deploy an Azure SQL Database to a virtual machine running on Amazon Web Services (AWS) and manage it from Azure. It’s true, I saw it happen this week.

R Support for Azure Machine Learning

Azure Machine Learning now has a new web interface and it just got support for the R programming language. Python support has been available for a while. Azure Machine Learning is an environment to help with all the aspects of data science from data cleaning to model training to deployment.


There were a few other interesting announcements which are not completely specific to data scientists, but are worth mentioning.

Visual Studio Online

This is exactly what it sounds like. An Integrated Development Environment (IDE) in your browser. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet. Years ago, there were many companies attempting to do this, but most are no longer around. Hopefully, now is a better time for an in-browser IDE and Visual Studio Online will succeed.

Project Silica

Without question, this was the coolest thing to be unveiled at Ignite. Microsoft Research has come up with a technique to store data into a piece of glass. They call it Project Silica. I was fortunate to be able to speak with one of the lead researchers on the project, so I will be sharing more from that interview later. It is fascinating, but probably years from implementation.

Azure Quantum

I have been ignoring Quantum for a while now, but it is time for that ignorance to stop. Microsoft is a (qu)bit late to the game but they are making some impressive progress. The Quantum Development Kit (QDK) has been released and Azure Quantum is in private preview, so you can sign up to be an early adopter.

Those are the big data science announcements of the week.

5 great Data Strategy Resources

I am putting together some of my own resources on Data Strategy. Here are a few of the top resources I found helpful so far.

  1. What is a Data Strategy? – various definitions of a data strategy
  2. The 5 essential Components of a Data Strategy – a detailed whitepaper(PDF) from SAS
  3. How to Create a Successful Data Strategy – a detailed report from MIT
  4. How Do You Develop a Data Strategy (including 6 steps) – by Bernard Marr, He has created more data strategies than anyone, so his advice is rock-solid. Also, the entire site contains more helpful information.
  5. Building the AI-Powered Organization – while not specific to data strategy, it fits the topic

Keep watching the blog for more information around my thoughts on Data Strategy.

Nuts About Data Book Review

Just released this week, Nuts about Data, is a fun introductory book about the data science process. Meor Amer tells a witty story about squirrels, mining for nuts, teamwork, and survival. It brings together the entire data science lifecycle from asking questions to final storytelling.

It is a quick read and really fun. I highly recommend it and hope you enjoy it.

New Online Data Summit Coming Fall 2019

A new online conference focused on cloud data technologies is coming this fall. It is not just a conference or webinar, it will be an interactive online platform. The focus of the event is data in the cloud (migrating, storing and machine learning). You can pre-register for the conference now.

Some of the topics from the summit include:

  • Data Science
  • IoT
  • Streaming Data
  • AI
  • Data Visualization

Here is an excerpt from the website:

The public cloud has drastically changed systems design, enabled microservices, and lowered the barrier to entry for big data & analytics.

Learn from companies which have migrated data platforms from on-premise to the cloud. See how they were redesigned to take advantage of endless storage and compute power.

Immerse yourself with the platforms which make modern Data Science and Machine Learning possible. Join your peers to see how their data platforms knocked down the old barriers and transformed how they work

What: Cloud Data Summit
Where: Online (a new conference platform)
When: October 16-17, 2019
How to pre-register: Register Online

I hope to see you there.