Tag Archives: data science

My Experience Taking Microsoft DP-100: Designing and Implementing a Data Science Solution on Azure

I took and passed DP-100 during the beta period. I recorded a live video talking about my experience. Below is that section of the live video. Also, here are the main topics:

  • Azure ML Studio
  • Machine Learning
  • Python
  • High-level knowledge of Azure Products

Also, if you want a checklist to prepare for the exam, I have created one, it is free.

What is on the Microsoft Data Science Certification Exam?

I took and passed the exam during the beta period. These are my memories of the topics on the exam. You can get this information as the Microsoft Azure Data Scientist Checklist.

General Overview

Below is the basic structure of the DP-100: Designing and Implementing a Data Science Solution on Azure. Passing the exam will qualify you for the Azure Data Scientist Associate certification. It can be taken in a traditional exam center or at your home (but you will be watched via video camera).

  • 180 minutes
  • 60 questions (45 Q&A, 15 about Case Studies)

The exam can be broken down into 4 components: Machine Learning, Azure ML Studio, Azure Products, and Python. Below is a breakdown of the topics I remember from the exam.

Machine Learning

These are topics which would be covered in a traditional machine learning course. Here are some of the specific topics I remember.

  • Evaluation of Linear Regression
  • Evaluation of Classification
  • Positive/negative skew
  • Poisson regression
  • Fisher’s exact test
  • Pearson
  • PCA
  • Deep learning – high-level, what is is for
  • Neural Networks (RNN vs CNN vs DCN vs GAN)

Azure ML Studio

Azure ML Studio is a major focus of the exam, so you need to be fluent in how to use it. Questions ranged from the basics of how to import data all the way to specifics about certain modules.

  • Model Building
  • Model Evalutation
  • SMOTE
  • MICE
  • feature extraction
  • missing data questions
  • AutoML

Azure Products

There were a number of questions from this category. The question would present you a scenario problem and ask which products would be useful for solving the problem. The questions did not go very deep into any of the products, but you will need to know the purpose of these products.

  • Azure Machine Learning Service
  • Blob storage – specifically how to get data in/out
  • Azure Notebooks
  • Azure Cognitive Services (high level)
  • Kubernetes
  • HDInsight
  • Data Science Virtual Machine

Python

Python was the language of choice for the exam, so focus on it.

  • Scikit-learn
  • Azure Machine Learning SDK for Python
  • Hyperparameters

Not on the exam

The following topics were not covered on my exam. The exam questions are pulled from a pool of questions, so it is possible these topics may be cover on a different person’s exam. In any case, these are definitely not major portions of the exam.

  • R
  • Power BI
  • Publishing Azure ML models

Again, if you want this information in an easy to follow checklist, just visit, Microsoft Azure Data Scientist Checklist.

Data Science News for April 29, 2019

Here is the latest data science news for the week of April 29, 2019.

From Data Science 101

General Data Science

What do you think? Did I miss any big news in the data science world?

GoLang for Data Science

While it is not one of the popular programming languages for data science, The Go Programming Language (aka Golang) has surfaced for me a few times in the past few years as an option for data science. I decided to do some searching and find some conclusions about whether golang is a good choice for data science.

Popularity of Go and Data Science

As the following figure from Google Trends demonstrates, golang and data science became trendy topics at about the same time and grew at a similar rate.

The timely trends may have created the desire to merge the two technologies together.

Golang Projects for Data Science

Some internet searching will reveal a number of interesting Golang/Data Science projects on Github. Unfortunately, many of the projects had good initial traction but have dwindled in activity over the last couple years. Below is a listing of some of the data science related projects for Golang.

  • Gopher Data – Gophers doing data analysis, no schedule events, last blog post was 2017
  • Gopher Notes – Golang in Jupyter Notebooks
  • Lgo – Interactive programming with Jupyter for Golang
  • Gota – Data frames for Go, “The API is still in flux so use at your own risk.”
  • qframe – Immutable data frames for Go, better speed than Gota but not as well documented
  • GoLearn – Machine Learning for Go
  • Gorgonia – Library for machine learning in Go
  • Go Sklearn – Port of sci-kit learn from Python, still active but only a couple committers, early but promising
  • Gonum – Numerical library for Go, very promising and active

Golang Data Science Books

There have even been a couple books written about the topic.

Thoughts from the Community

The “Go for Data Science” debate has been discussed numerous times over the past few years. Below is a listing of some of those discussions and the key take aways.

Reasons to use Golang for Data Science

  • Performance
  • Concurrency
  • Strong Developer Ecosystem
  • Basic Data Science packages are available

Reasons Not to use Golang for Data Science

  • Limited support from the data science community for Golang
  • Significantly increased time for exploratory analysis
  • Less flexibility to try other optimization and ML techniques
  • The data science community has not really adopted golang

Summary

In short, Golang is not widely used for exploratory data science, but rewriting your algorithms in Golang might be a good idea.

Finding Azure Updates

Microsoft Azure has an abundance of data science capabilities (and non-data science capabilities). It can be challenging to keep up with the latest updates/releases. Luckily, Azure has a page to let you know exactly what has changed. You just need to know where to find it, and the following video will help you find that page.

Also, if you are still interested in earning a Microsoft Data Science Certification, join the Study Group.