Tag Archives: data scientist

Do You Need College To Be A Data Scientist?

With some of the top tech entrepreneurs in the U.S. either dropping out of college or not attending, there is some debate about whether college is the right choice or not. This post will focus on college for data science. However, for college in general, if you know what you want to study, then college or graduate school is a great option. If you are going to college because you do not know what else to do, I would say college is too expensive for that.

College?

Most would agree that an undergraduate degree in some highly analytical field (math, CS, economics, physics) is definitely beneficial. Plus college has a strict set of guidelines and a specific order for the learning. A formal degree program often provides the necessary motivation for a person to continue learning. The U.S. college education system is not perfect, but if it keeps a person from quitting, it will help to reach the goal of becoming a data scientist.

All this leads to a second point. Only a few colleges offer undergraduate degree programs for data science. Thus, graduate school or more learning will still be required. College should provide the necessary prerequisites and many employers will pay for the continued learning.

No College?

A highly motivated person could probably learn most if not all the data science skills on the internet for free or very low cost. The key is being a highly motivated person. That person must have the drive to not quit when the learning becomes difficult. Also, there are no classmates or professors to help with difficult concepts. Sure, the internet can help there, but it requires a bit more work to find the help. Plus, knowing what topics to learn and in what order can be challenging. Already, this blog has much helpful content, but it is not organized based upon a sequence of learning. Not attending college presents some obstacles that only the most highly motivated students will overcome. As more and more learning resources appear online, the no college option may become more popular.

What is the Answer?

Strictly speaking, I would say the answer is NO. However, many people will not succeed without the rigor of school, and some companies will not hire a person without a degree. So, college is not 100% essential to being a data scientist, but for many it is probably the best option.

Data Scientist Job Analysis

A few weeks ago, I posted 16 Companies Hiring Data Scientists Right Now. I decided to do a bit of analysis on the job posts, so I took all the job posting and compiled them into one file.

The Problem

I wanted to determine 2 things:

  1. What words occurred most often in the job posts?
  2. What words occurred in the most jobs posts?

The questions are similar, but if you read closely, they are different.   I wrote some Java code to answer those questions. The raw results are posted here.

Results

Honestly, nothing too surprising showed up. Not counting the common English words (and, to), the word data was the most popular. It occurred 167 times and it occurred at least once in all 16 job postings. That makes sense; a data scientist should know about data. I thought hadoop would occur in all job descriptions but it only appeared in 11 of the 16 job descriptions. Here are some other words I found interesting:

  • statistical occured 29 times and in 10 job descriptions
  • analysis occured 46 times and in 13 job descriptions
  • analytics occured 22 times and in 6 job descriptions
  • statistics occured 16 times and in 9 job descriptions
  • machine learning occured 14 times and in 9 job descriptions
  • phd occured 11 times and in 11 job descriptions
  • sql occured 12 times and in 10 job descriptions

On an interesting note, Python and R occurred in more job postings than Java (2 more to be exact).

Does anything in the results strike you as interesting?

Use Data Science to Help The World (Data Without Borders)

Data Without Borders

Jake Porway started Data Without Borders because he attended a hack-a-thon and the groups came up with apps that didn’t really better the world very much. I believe he used the word, “unfulfilling” to describe the apps. He decided to create a way to provide organizations (Government or non-profit) with access to data scientists. His thinking goes like this. There are lots of data scientists that love to work with data. There are great organization with lots of data. If the two can be matched together, what amazing things can be done? Data Without Borders hopes to find out.

Data Without Borders organizes a bunch of DataDives, which are weekend hack-a-thons that match up a group of data scientists and developers with data.

Jake concluded with some wonderful remarks:

What if we started using data not just to make better decisions about what kind of movies we wanted to see? What if we started using data to make betters decisions about what kind of a world we wanted to see?

What is Data Without Borders looking for?

Jake’s Presentation at PopTech

Natural Language Processing Starts Today

The Coursera Natural Language Processing course officially starts today.  Sign up and start learning.

16 Companies Hiring Data Scientists Right Now

Data Scientist is the hot new job for 2012.  Does this job really exist?  Who hires these people? Are companies currently hiring? The answers are: yes, lots of companies, and yes. I decided to spend last night looking for companies that are currently hiring data scientists.  It did not take long to compile a pretty good list.

Data Scientist Job Openings

Company Location Link
Microsoft Redmond, WA Microsoft Sr. Data Scientist
Netflix Los Gatos, CA NetFlix Senior Data Scientist
Kaggle San Francisco, CA Kaggle Data Scientist
Greenplum San Mateo, CA Greenplum Data Scientist
Last.fm London Last.fm Data Scientist
Rackspace San Antonio, TX Rackspace Data Scientist
Amazon Seattle, WA Amazon Data Scientist/System Architect
Facebook Menlo Park, CA Facebook Data Scientist
Twitter San Francisco, CA Twitter Data Scientist
LinkedIn Mountain View, CA LinkedIn Data Scientist
Cobalt/ADP Cambridge, MA Cobalt Data Scientist
Ebay/Paypal San Jose, CA Paypal Data Scientist
Bunchball San Jose, CA Bunchball Data Scientist
A9 Palo Alto, CA Principal Engineer/Data Scientist
Acxiom Little Rock, AR Acxiom Data Scientist
Trulia San Francisco, CA Trulia Data Scientist – Data Science Lab

Do you know of any other companies hiring Data Scientists right now?

Data Scientist – Career of the Future

Data Scientist – Career of the Future

This link provides a great Infographic about data scientist career opportunities.

What Makes a Good Data Scientist?

Jeremy Howard is the Chief Scientist at Kaggle. At the end of this interview, from the Strata Conference 2012, he identified 4 simple traits that a data scientist needs.

  1. Creativity
  2. Open-mindedness
  3. Tenacity
  4. A Good Skillset

Jeremy Howard of Kaggle at Strata 2012

In this brief interview he covers a range of other data science topics:

  • Big Data is an engineering problem
  • Analytics generate value/insight from data
  • Predictive Modeling is about answering a question – build a model to do that
  • Is Data Science about tools or people? – watch the video for Jeremy’s answer
  • And others…

See this previous post for more videos from Strata 2012.

Need A Data Scientist? Probably

In the article Do you need a data scientist?, the following questions get answered:

  • What data scientist’s do?
  • Who makes a good data scientist?
  • When is the right time to hire a data scientist?

Hopefully, I will discuss each of these questions in more detail in a later blog post.

To answer the question, if your data is growing you would probably benefit from a data scientist.

The following is a video that goes along with this topic.

What is a data scientist?

If I am going to create a blog about becoming a data scientist, I must at least provide some type of definition.  One of the best definitions I have read is by Hilary Mason, Chief Scientist at Bit.ly,

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning.

This definition is short and simple, but there are many more definitions out there.  In fact CITO Research, a site for CIOs and CTOs, set out to define what a data scientist is.  They interviewed six leaders in the data science community, and posted all of the interviews online.  The interviews produced varied results, but focused on some main themes of what a data scientist should know.

After reading Hilary’s definition, the CITO Research interview’s, a great post at Quora, and numerous other articles, I created a list of data science skills:

  • Machine Learning
  • Statistics
  • Story Telling (Communication)
  • Big Data
  • Algorithms
  • Curiosity

I am sure this list will change and evolve over time, but that is where I am going to focus for now.  If you have anything to add to the list, please leave a comment.  If you are interested in gaining some data science skills, please follow along and let’s learn together.

Why did I create Data Science 101?

Obviously the world does not need another blog. However, blogs are a great way to share information, and I am creating a new one anyway.

The analysis of data is becoming more important everyday. Data Science is quickly becoming a hot topic of interest, and I have a desire to become a data scientist. Thus, this blog will contain information I find useful during my data science journey. I hope others find the blog useful too.

If you are interested in becoming a data scientist, please follow along and let’s start learning together.