Transitioning to a career in data science can be full of unanswered questions. I am here to help you get answers to those questions.
Today, I am launching a new Youtube Channel, Learn Data Science. I will select a question and make a video providing an answer. I will provide some of the answers, and I may have some guests answer the questions as well.
If you have any questions about becoming a data scientist, please leave a comment.
Renowned data scientist, Kirk Borne will take viewers on a journey through his career in science and technology explaining how the industry-and himself have evolved over the last 4 decades. Starting with skipping lunches in high school to a systematic twitter obsession, Kirk will shed light on his road to success in the data science industry.
Kirk is universally considered one of the most (if not the most) influential voices in data science. If you are interested in a career in data science, this is a webinar you will not want to miss.
The webinar is 5:30 Eastern Time on August 29, 2017, and registrations are currently being accepted. It is free.
The differences between Data Scientists, Data Engineers, and Software engineers can get a little confusing at times. Thus, here is a guest post provided by Jake Stein, CEO at Stitch formerly RJ Metrics, which aims to clear up some of that confusion based upon LinkedIn data.
As data grows, so does the expertise needed to manage it. The past few years have seen an increasing distinction between the key roles tasked with managing data: software engineers, data engineers, and data scientists.
More and more we’re seeing data engineers emerge as a subset within the software engineering discipline, but this is still a relatively new trend. Plenty of software engineers are still tasked with moving and managing data.
Our team has released two reports over the past year, one focused on understanding the data science role, one on data engineering. Both of these reports are based on self-reported LinkedIn data. In this post, I’ll lay out the distinctions between these roles and software engineers, but first, here’s a diagram to show you (in very broad strokes) what we saw in the skills breakdown between these three roles:
A software engineer builds applications and systems. Developers will be involved through all stages of this process from design, to writing code, to testing and review. They are creating the products that create the data. Software engineering is the oldest of these three roles, and has established methodologies and tool sets.
Frontend and backend development
Operating system development
A data engineer builds systems that consolidate, store, and retrieve data from the various applications and systems created by software engineers. Data engineering emerged as a niche skill set within software engineering. 40% of all data engineers were previously working as a software engineer, making this the most common career path for data engineers by far.
Advanced data structures
Knowledge of new & emerging tools: Hadoop, Spark, Kafka, Hive, etc.
Building ETL/data pipelines
A data scientist builds analysis on top of data. This may come in the form of a one-off analysis for a team trying to better understand customer behavior, or a machine learning algorithm that is then implemented into the code base by software engineers and data engineers.
Business Intelligence dashboards
Evolving Data Teams
These roles are still evolving. The process of ETL is getting much easier overall as new tools (like Stitch) enter the market, making it easy for software developers to set up and maintain data pipelines. Larger companies are pulling data engineers off the software engineering team entirely in lieu of forming a centralized data team where infrastructure and analysis sit together. In some scenarios data scientists are responsible for both data consolidation and analysis.
At this point, there is no single dominant path. But we expect this rapid evolution to continue, after all, data certainly isn’t getting any smaller.
While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
Be flexible and adaptable – There is no single tool or technique that always works best.
Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
Not all building models – Like the previous tip, you must have skills beyond just model building.
Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
Be polymath – It helps to be a person with a wide range of knowledge.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.
The fine folks at DataCamp, a great site for learning data science right in your browser, have come up with another great infographic. This time it compares some of the many job titles in the data science field.
The infographic lays out the roles and skills needed for the following job titles. Note: not all the job roles can be confused with a data scientist, but all the roles can be important when completing an entire data science project.
This is one of the better descriptions, I have seen, for what a data scientist does.
They must find interesting, novel, and useful insights about the real world in the data. And they must turn those insights into products and services, and deliver those products and services at a profit.
Notice, data scientists don’t just need to find insights in data. They also need create profitable products from that insight. I often times feel that data products are not seen as important as improving the machine learning algorithms, but the data products really are the end goal.
Want to learn Data Science in 12 weeks? Zipfian Academy is offering just that.
The inaugural class will begin Fall 2013. Also the schedule is five days a week from 9 a.m. to 7 p.m., so it is a very intensive program. You must be willing to relocate to San Francisco for the 12 weeks. The cost of the data science program is $14,400, but some scholarships and sponsorships are available.
At first the cost seems high, but when you consider the program will prepare you for a different career in just 12 weeks, it does not sound so bad. I think you are paying for 2 things: the immense amount of information and the condensed format. The information planned to be covered does look very extensive, everything from storing data to cleaning data to machine learning.
I am not aware of another program like this existing. If you are not concerned with getting a “university degree” and would like to learn data science, I think Zipfian Academy looks like a good choice.