Awhile ago, I recorded a Facebook Live on Tips for Data Science Students. It goes along with the following post: Getting the most from your Data Science Masters Program.
While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?
What follows is a summary of the many tips.
Tips for Data Science
- Be flexible and adaptable – There is no single tool or technique that always works best.
- Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
- Not all building models – Like the previous tip, you must have skills beyond just model building.
- Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
- Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
- Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
- Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
- Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
- Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
- Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
- Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
- Be polymath – It helps to be a person with a wide range of knowledge.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.
Pedro Domingos of the Department of Computer Science and Engineering at the University of Washington provides a very useful paper with tips for machine learning. The paper is title, A Few Useful Things to Know about Machine Learning [pdf].
Below are the 12 useful tips.
- LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION
- IT’S GENERALIZATION THAT COUNTS
- DATA ALONE IS NOT ENOUGH
- OVERFITTING HAS MANY FACES
- INTUITION FAILS IN HIGH DIMENSIONS
- THEORETICAL GUARANTEES ARE NOT WHAT THEY SEEM
- FEATURE ENGINEERING IS THE KEY
- MORE DATA BEATS A CLEVERER ALGORITHM
- LEARN MANY MODELS, NOT JUST ONE
- SIMPLICITY DOES NOT IMPLY ACCURACY
- REPRESENTABLE DOES NOT IMPLY LEARNABLE
- CORRELATION DOES NOT IMPLY CAUSATION
For details and a good explanation of each, see the paper A Few Useful Things to Know about Machine Learning [pdf].
Also,later this year, Pedro Domingos will be teaching a machine learning course via Coursera. Sign up if you are interested.