Tag Archives: career

Getting Your First Job in Data Science

Getting your first data science job might be challenging, but it’s possible to achieve this goal with the right resources.

Before jumping into a data science career, there are a few questions you should be able to answer:

  • How do you break into the profession?
  • What skills do you need to become a data scientist?
  • Where are the best data science jobs?

First, it’s important to understand what data science is. To do data science, you have to be able to process large datasets and utilize programming, math, and technical communication skills. You also need to have a sense of intellectual curiosity to understand the world through data. To help complete the picture around data science, let’s dive into the different roles within data science.

The Different Data Science Roles

Data science teams come together to solve some of the hardest data problems an organization might face. Each individual of the team will have a different part of the skill set required to complete a project from end to end.

Data Scientists

Data scientists are the bridge between programming and algorithmic thinking. A data scientist can run a project from end-to-end. They can clean large amounts of data, explore data sets to find trends, build predictive models, and create a story around their findings.

Data Analysts

Data analysts sift through data and provide helpful reports and visualizations. You can think of this role as the first step on the way to a job as a data scientist or as a career path in of itself.

Data Engineers

Data engineers typically handle large amounts of data and lay the groundwork for data scientists to do their jobs effectively. They are responsible for managing database systems, scaling data architecture to multiple servers, and writing complex queries to sift through the data.

The Data Science Process

Now that you have a general understanding of the different roles within data science, you might be asking yourself “what do data scientists actually do?

Data scientists can appear to be wizards who pull out their crystal balls (MacBook Pros), chant a bunch of mumbo-jumbo (machine learning, random forests, deep networks, Bayesian posteriors) and produce amazingly detailed predictions of what the future will hold.

Data science isn’t magic mumbo-jumbo though, and the more precise we get about to clarify this, the better. The power of data science comes from a deep understanding of statistics,algorithms, programming, and communication skills. More importantly, data science is about applying these  skill sets in a disciplined and systematic manner. We apply these skill sets via the data science process. Let’s look at the data science process broken down into 6 steps.

Step 1: Frame the problem

Before you can start solving a problem, you need to ask the right questions so you can frame the problem.

Step 2: Collect the raw data needed for your problem

Now, you should think through what raw data you need to solve your problem and find ways to get that data.

Step 3: Process the data for analysis

After you collect the data, you’ll need to begin processing it and checking for common errors that could corrupt your analysis.

Step 4: Explore the data

Once you have finished cleaning your data, you can start looking into it to find useful patterns.

Step 5: Perform in-depth analysis

Now, you will be applying your statistical, mathematical and technological knowledge to find every insight you can in the data.

Step 6: Communicate the results of the analysis

The last step in the data science process is presenting your insights in an elegant manner. Make sure your audience knows exactly what you found.

If you worked as a data scientist, you would apply this process to your work every day.

What’s next?

Before you jump into data science and working through the data science process, there are some things you need to learn to become a data scientist.

Most data scientists use a combination of skills every day. Among the skills necessary to become a data scientist include an analytical mindset, mathematics, data visualization, and business knowledge, just to name a few.

In addition to having the skills, you’ll need to then learn how to use the modern data science tools. Hadoop, SQL, Python, R, Excel are some of the tools you’ll need to be familiar using. Each tool plays a different role in the data science process.

If you’re ready to learn more about data science, take a deeper look at the skills necessary to become a data scientist, and how to get a job in data science, download Springboard’s comprehensive 60-page guide on How to get your first job in data science.


How to get a Data Science Job

About Springboard: At Springboard, we’re building an educational experience that empowers our students to thrive in technology careers. Through our online workshops, we have prepared thousands of people for careers in data science.

A Data Science Career with Kirk Borne, Free Webinar

Once again, The Data Incubator, is hosting another Data Science in 30 minutes webinar. This one features the career of Kirk Borne.

Renowned data scientist, Kirk Borne will take viewers on a journey through his career in science and technology explaining how the industry-and himself have evolved over the last 4 decades. Starting with skipping lunches in high school to a systematic twitter obsession, Kirk will shed light on his road to success in the data science industry.

Kirk is universally considered one of the most (if not the most) influential voices in data science. If you are interested in a career in data science, this is a webinar you will not want to miss.

The webinar is 5:30 Eastern Time on August 29, 2017, and registrations are currently being accepted. It is free.

How to Kickstart Your Data Science Career

This is a guest post from Michael Li of The Data Incubator. The The Data Incubator runs a free eight week data science fellowship to help transition their Fellows from Academia to Industry. This post runs through some of the toolsets you’ll need to know to kickstart your Data Science Career.

 

If you’re an aspiring data scientist but still processing your data in Excel, you might want to upgrade your toolset.  Why?  Firstly, while advanced features like Excel Pivot tables can do a lot, they don’t offer nearly the flexibility, control, and power of tools like SQL, or their functional equivalents in Python (Pandas) or R (Dataframes).  Also, Excel has low size limits, making it suitable for “small data”, not  “big data.”

In this blog entry we’ll talk about SQL.  This should cover your “medium data” needs, which we’ll define as the next level of data where the rows do not fit the 1 million row restriction in Excel.  SQL stores data in tables, which you can think of as a spreadsheet layout but with more structure.  Each row represents a specific record, (e.g. an employee at your company) and each column of a table corresponds to an attribute (e.g. name, department id, salary).  Critically, each column must be of the same “type”.  Here is a sample of the table Employees:

EmployeeId Name StartYear Salary DepartmentId
1 Bob 2001 10.5 10
2 Sally 2004 20 10
3 Alice 2005 25 20
4 Fred 2004 12.5 20

SQL has many keywords which compose its query language but the ones most relevant to data scientists are SELECT, WHERE, GROUP BY, JOIN.  We’ll go through these each individually.

SELECT

SELECT is the foundational keyword in SQL. SELECT can also filter on columns.  For example

SELECT Name, StartYear FROM Employees

returns

Name StartYear
Bob 2001
Sally 2004
Alice 2005
Fred 2004

 

WHERE

The WHERE clause filters the rows. For example

SELECT * FROM Employees WHERE StartYear=2004

returns

EmployeeId Name StartYear Salary DepartmentId
2 Sally 2004 20 10
4 Fred 2004 12.5 20

 

GROUP BY

Next, the GROUP BY clause allows for combining rows using different functions like COUNT (count) and AVG (average). For example,

SELECT StartYear, COUNT(*) as Num, AVG(Salary) as AvgSalary
FROM EMPLOYEES
GROUP BY StartYear

returns

StartYear Num AvgSalary
2001 1 10.5
2004 2 16.25
2005 1 25

 

JOIN

Finally, the JOIN clause allows us to join in other tables. For example, assume we have a table called Departments:

DepartmentId DepartmentName
10 Sales
20 Engineering

We could use JOIN to combine the Employees and Departments tables based ON the DepartmentId fields:

SELECT Employees.Name AS EmpName, Departments.DepartmentName AS DepName
FROM Employees JOIN Departments
ON Employees.DepartmentId = Departments.DepartmentId;

The results might look like:

EmpName DepName
Bob Sales
Sally Sales
Alice Engineering
Fred Engineering

We’ve ignored a lot of details about joins: e.g. there are actually (at least) 4 types of joins, but hopefully this gives you a good picture.

Conclusion and Further Reading

With these basic commands, you can get a lot of basic data processing done.  Don’t forget, that you can nest queries and create really complicated joins.  It’s a lot more powerful than Excel, and gives you much better control of your data.  Of course, there’s a lot more to SQL than what we’ve mentioned and this is only intended to wet your appetite and give you a taste of what you’re missing.

 

And when you’re ready to step it up from “medium data” to “big data”, you should apply for a fellowship at The Data Incubator where we work with current-generation data-processing technologies like MapReduce and Spark!