Tag Archives: data science

Course Launch – Intro to Azure ML Studio with Regression

Online courses are a great way to share knowledge with others; that is why I have decided to launch a few courses. The first course is Intro to Azure ML Studio – Regression. This is a smaller course and should take about 2 hours to complete.

Azure ML Studio is a drag-and-drop interface for doing machine learning.

Topics are all based upon Azure ML Studio, and they include:

  • Linear Regression
  • Linear Correlation
  • Feature Selection
  • Splitting Data
  • Evaluating a model

Use the code BLOGREADER to save 30%.

Getting Your First Job in Data Science

Getting your first data science job might be challenging, but it’s possible to achieve this goal with the right resources.

Before jumping into a data science career, there are a few questions you should be able to answer:

  • How do you break into the profession?
  • What skills do you need to become a data scientist?
  • Where are the best data science jobs?

First, it’s important to understand what data science is. To do data science, you have to be able to process large datasets and utilize programming, math, and technical communication skills. You also need to have a sense of intellectual curiosity to understand the world through data. To help complete the picture around data science, let’s dive into the different roles within data science.

The Different Data Science Roles

Data science teams come together to solve some of the hardest data problems an organization might face. Each individual of the team will have a different part of the skill set required to complete a project from end to end.

Data Scientists

Data scientists are the bridge between programming and algorithmic thinking. A data scientist can run a project from end-to-end. They can clean large amounts of data, explore data sets to find trends, build predictive models, and create a story around their findings.

Data Analysts

Data analysts sift through data and provide helpful reports and visualizations. You can think of this role as the first step on the way to a job as a data scientist or as a career path in of itself.

Data Engineers

Data engineers typically handle large amounts of data and lay the groundwork for data scientists to do their jobs effectively. They are responsible for managing database systems, scaling data architecture to multiple servers, and writing complex queries to sift through the data.

The Data Science Process

Now that you have a general understanding of the different roles within data science, you might be asking yourself “what do data scientists actually do?

Data scientists can appear to be wizards who pull out their crystal balls (MacBook Pros), chant a bunch of mumbo-jumbo (machine learning, random forests, deep networks, Bayesian posteriors) and produce amazingly detailed predictions of what the future will hold.

Data science isn’t magic mumbo-jumbo though, and the more precise we get about to clarify this, the better. The power of data science comes from a deep understanding of statistics,algorithms, programming, and communication skills. More importantly, data science is about applying these  skill sets in a disciplined and systematic manner. We apply these skill sets via the data science process. Let’s look at the data science process broken down into 6 steps.

Step 1: Frame the problem

Before you can start solving a problem, you need to ask the right questions so you can frame the problem.

Step 2: Collect the raw data needed for your problem

Now, you should think through what raw data you need to solve your problem and find ways to get that data.

Step 3: Process the data for analysis

After you collect the data, you’ll need to begin processing it and checking for common errors that could corrupt your analysis.

Step 4: Explore the data

Once you have finished cleaning your data, you can start looking into it to find useful patterns.

Step 5: Perform in-depth analysis

Now, you will be applying your statistical, mathematical and technological knowledge to find every insight you can in the data.

Step 6: Communicate the results of the analysis

The last step in the data science process is presenting your insights in an elegant manner. Make sure your audience knows exactly what you found.

If you worked as a data scientist, you would apply this process to your work every day.

What’s next?

Before you jump into data science and working through the data science process, there are some things you need to learn to become a data scientist.

Most data scientists use a combination of skills every day. Among the skills necessary to become a data scientist include an analytical mindset, mathematics, data visualization, and business knowledge, just to name a few.

In addition to having the skills, you’ll need to then learn how to use the modern data science tools. Hadoop, SQL, Python, R, Excel are some of the tools you’ll need to be familiar using. Each tool plays a different role in the data science process.

If you’re ready to learn more about data science, take a deeper look at the skills necessary to become a data scientist, and how to get a job in data science, download Springboard’s comprehensive 60-page guide on How to get your first job in data science.


How to get a Data Science Job

About Springboard: At Springboard, we’re building an educational experience that empowers our students to thrive in technology careers. Through our online workshops, we have prepared thousands of people for careers in data science.

My Experience Taking Microsoft DP-100: Designing and Implementing a Data Science Solution on Azure

I took and passed DP-100 during the beta period. I recorded a live video talking about my experience. Below is that section of the live video. Also, here are the main topics:

  • Azure ML Studio
  • Machine Learning
  • Python
  • High-level knowledge of Azure Products

Also, if you want a checklist to prepare for the exam, I have created one, it is free.

What is on the Microsoft Data Science Certification Exam?

I took and passed the exam during the beta period. These are my memories of the topics on the exam. You can get this information as the Microsoft Azure Data Scientist Checklist.

General Overview

Below is the basic structure of the DP-100: Designing and Implementing a Data Science Solution on Azure. Passing the exam will qualify you for the Azure Data Scientist Associate certification. It can be taken in a traditional exam center or at your home (but you will be watched via video camera).

  • 180 minutes
  • 60 questions (45 Q&A, 15 about Case Studies)

The exam can be broken down into 4 components: Machine Learning, Azure ML Studio, Azure Products, and Python. Below is a breakdown of the topics I remember from the exam.

Machine Learning

These are topics which would be covered in a traditional machine learning course. Here are some of the specific topics I remember.

  • Evaluation of Linear Regression
  • Evaluation of Classification
  • Positive/negative skew
  • Poisson regression
  • Fisher’s exact test
  • Pearson
  • PCA
  • Deep learning – high-level, what is is for
  • Neural Networks (RNN vs CNN vs DCN vs GAN)

Azure ML Studio

Azure ML Studio is a major focus of the exam, so you need to be fluent in how to use it. Questions ranged from the basics of how to import data all the way to specifics about certain modules.

  • Model Building
  • Model Evalutation
  • SMOTE
  • MICE
  • feature extraction
  • missing data questions
  • AutoML

Azure Products

There were a number of questions from this category. The question would present you a scenario problem and ask which products would be useful for solving the problem. The questions did not go very deep into any of the products, but you will need to know the purpose of these products.

  • Azure Machine Learning Service
  • Blob storage – specifically how to get data in/out
  • Azure Notebooks
  • Azure Cognitive Services (high level)
  • Kubernetes
  • HDInsight
  • Data Science Virtual Machine

Python

Python was the language of choice for the exam, so focus on it.

  • Scikit-learn
  • Azure Machine Learning SDK for Python
  • Hyperparameters

Not on the exam

The following topics were not covered on my exam. The exam questions are pulled from a pool of questions, so it is possible these topics may be cover on a different person’s exam. In any case, these are definitely not major portions of the exam.

  • R
  • Power BI
  • Publishing Azure ML models

Again, if you want this information in an easy to follow checklist, just visit, Microsoft Azure Data Scientist Checklist.

Data Science News for April 29, 2019

Here is the latest data science news for the week of April 29, 2019.

From Data Science 101

General Data Science

What do you think? Did I miss any big news in the data science world?