I have recently been exploring with creating YouTube Live Videos. Go to the Data Science 101 Facebook page to know more.
While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?
What follows is a summary of the many tips.
Tips for Data Science
- Be flexible and adaptable – There is no single tool or technique that always works best.
- Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
- Not all building models – Like the previous tip, you must have skills beyond just model building.
- Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
- Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
- Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
- Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
- Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
- Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
- Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
- Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
- Be polymath – It helps to be a person with a wide range of knowledge.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.
I recently read the article, Facebook’s Gen Y Nightmare. It is worth reading (or at least skimming). Here are the highlights of the article.
In 8 to 10 years, a fictitious company named Narrative Data will be able to perform character and personality analysis. Narrative Data will analyze peoples social media accounts and other online activity to determine things such as: productivity, effectiveness, personality type, and various other traits. Then the article goes on to mention Narrative Data is able to identify a particular person suffers from acute migraine headaches. The migraines occur a couple times a month. As a result of this finding, that particular person is not considered for a job opening.
Here are my thoughts. First, is something like this even possible? I would think yes. More importantly, how useful is this analysis? My assumption is that there are no perfect people. If you look long enough you can find flaws with anyone. So, in order to avoid the above situation, a person would have to drastically censor his/her online activity or just withdraw completely from online activities. Then Narrative Data would not be able to find any problems. This situation will lead to the same problem employers have today (not enough information). The above situation in the article is an example of too much information. Given the choice between hiring a slightly flawed person with lots of available information or hiring a person who’s flaws have not yet been revealed because of a lack of information, I would choose the slightly flawed person.
If a company like Narrative Data ever does exist, it is almost certain that the opposite type of companies will start to exist. By that, I mean companies will be created with the intent of helping people hide character traits. That messes up all the analysis.
What are your thoughts? Will something like this exist? Would it really be beneficial?