This month, The Data Incubator is hosting another free webinar on data science. This time it features an interview with Jake Porway, founder of DataKind. Jake and DataKind are doing amazing things, so I hope you can check out the webinar.
Below are the webinar details:
Data Science in 30 Minutes: Using Data Science in the Service of Humanity
Abstract: With his non-profit, DataKind, Jake Porway connects data scientists with social causes. Picture Doctors Without Borders for data nerds—with machine learning and AI engineers parachuting in to help the UN with humanitarian tasks, like tracking virus outbreaks using mobile data. “Data is like a bucket of crude oil,” Porway says. “Potentially great, but only if someone knows how to refine it (data scientists) and someone else has vehicles that will run on it (the social sector).” In this talk, Porway discusses the strategies of DataKind, its projects and the future of big data to service humanity.
Transitioning to a career in data science can be full of unanswered questions. I am here to help you get answers to those questions.
Today, I am launching a new Youtube Channel, Learn Data Science. I will select a question and make a video providing an answer. I will provide some of the answers, and I may have some guests answer the questions as well.
If you have any questions about becoming a data scientist, please leave a comment.
Renowned data scientist, Kirk Borne will take viewers on a journey through his career in science and technology explaining how the industry-and himself have evolved over the last 4 decades. Starting with skipping lunches in high school to a systematic twitter obsession, Kirk will shed light on his road to success in the data science industry.
Kirk is universally considered one of the most (if not the most) influential voices in data science. If you are interested in a career in data science, this is a webinar you will not want to miss.
The webinar is 5:30 Eastern Time on August 29, 2017, and registrations are currently being accepted. It is free.
The differences between Data Scientists, Data Engineers, and Software engineers can get a little confusing at times. Thus, here is a guest post provided by Jake Stein, CEO at Stitch formerly RJ Metrics, which aims to clear up some of that confusion based upon LinkedIn data.
As data grows, so does the expertise needed to manage it. The past few years have seen an increasing distinction between the key roles tasked with managing data: software engineers, data engineers, and data scientists.
More and more we’re seeing data engineers emerge as a subset within the software engineering discipline, but this is still a relatively new trend. Plenty of software engineers are still tasked with moving and managing data.
Our team has released two reports over the past year, one focused on understanding the data science role, one on data engineering. Both of these reports are based on self-reported LinkedIn data. In this post, I’ll lay out the distinctions between these roles and software engineers, but first, here’s a diagram to show you (in very broad strokes) what we saw in the skills breakdown between these three roles:
A software engineer builds applications and systems. Developers will be involved through all stages of this process from design, to writing code, to testing and review. They are creating the products that create the data. Software engineering is the oldest of these three roles, and has established methodologies and tool sets.
Frontend and backend development
Operating system development
A data engineer builds systems that consolidate, store, and retrieve data from the various applications and systems created by software engineers. Data engineering emerged as a niche skill set within software engineering. 40% of all data engineers were previously working as a software engineer, making this the most common career path for data engineers by far.
Advanced data structures
Knowledge of new & emerging tools: Hadoop, Spark, Kafka, Hive, etc.
Building ETL/data pipelines
A data scientist builds analysis on top of data. This may come in the form of a one-off analysis for a team trying to better understand customer behavior, or a machine learning algorithm that is then implemented into the code base by software engineers and data engineers.
Business Intelligence dashboards
Evolving Data Teams
These roles are still evolving. The process of ETL is getting much easier overall as new tools (like Stitch) enter the market, making it easy for software developers to set up and maintain data pipelines. Larger companies are pulling data engineers off the software engineering team entirely in lieu of forming a centralized data team where infrastructure and analysis sit together. In some scenarios data scientists are responsible for both data consolidation and analysis.
At this point, there is no single dominant path. But we expect this rapid evolution to continue, after all, data certainly isn’t getting any smaller.
While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
Be flexible and adaptable – There is no single tool or technique that always works best.
Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
Not all building models – Like the previous tip, you must have skills beyond just model building.
Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
Be polymath – It helps to be a person with a wide range of knowledge.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.
The fine folks at DataCamp, a great site for learning data science right in your browser, have come up with another great infographic. This time it compares some of the many job titles in the data science field.
The infographic lays out the roles and skills needed for the following job titles. Note: not all the job roles can be confused with a data scientist, but all the roles can be important when completing an entire data science project.