Tag Archives: computer science

Building Data Science Skills as an Undergraduate

While there are a growing number of universities that offer undergraduate data science degrees, for one reason or another those programs may not be perfect for everyone interested in data science. So, what do you do if you attend a school that does not offer a data science degree? This is a question frequently asked of me, so I thought I would elaborate on my typical response.

You Cannot Know It All

First off, you will never know all there is to know about data science. The field is vast and contains many sub-fields. Thus, as an undergraduate, a good plan is to learn the fundamentals. Then expand your knowledge/expertise as your education and career continue. Data Science is evolving rapidly and it requires continual learning. Hopefully, this is one of the reasons you are interested in the field.

My Recommended Approach

A good plan is to major in computer science or statistics and minor in the other. If your school doesn’t have either of those major, then take as many of those classes as you can. Next, choose a domain specific area such as business, chemistry, psychology, etc.; and gear your elective classes toward that domain area. This approach will give you a solid base understanding of the statistical and computational underpinnings of data science. You should also be well-prepared to find a job or continue your studies in graduate school.

Also, somewhat related, taking an art class or two might not be a bad idea. Visualization is very important to data science. Understanding color palettes and usage of space on a canvas are concepts that will serve you well. Plus, many people strong in computer science and statistical algorithms are lacking in artistic skills.

Some Enhancements to Your Education

If your location allows, consider attending local meetups. Finally, get involved with whatever projects you can (Kaggle, internships, open source, …).

Do you have any advice for undergraduates looking to study data science? If so, please leave a comment.

Are you and undergraduate with questions? Please ask in the comments below.

Free Stats book for Computer Scientists

Professor Norm Matloff from the University of California, Davis has published From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science which is an open textbook. It approaches statistics from a computer science perspective. Dr. Matloff has been both a professor of statistics and computer science so he is well suited to write such a textbook. This would a good choice of a textbook for a statistics course targeted at primarily computer scientists. It uses the R programming language. The book starts by building the foundations of probability before entering statistics.

Tips for Future Data Scientists

While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.

I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?

What follows is a summary of the many tips.

Tips for Data Science

  • Be flexible and adaptable – There is no single tool or technique that always works best.
  • Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
  • Not all building models – Like the previous tip, you must have skills beyond just model building.
  • Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
  • Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
  • Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
  • Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
  • Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
  • Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
  • Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
  • Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
  • Be polymath – It helps to be a person with a wide range of knowledge.

Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.

Real-Time Machine Learning for Industry

Michael Cutler, cofounder of TUMRA, gave a nice talk to the University of Oxford Computer Science Department. The following quote from his talk sums up his idea.

Given a choice between a “best guess” now, and a “marginally better” answer later, I’d take the best guess every time.

Many times, academic people focus a lot of attention on improving the accuracy of an algorithm, when the resulting solution is too slow for industrial purposes.

reference: TUMRA Blog

Coursera Adds 17 New Universities

Just Announced, Coursera adds 17 new universities. Those universities include Columbia and Brown, as well as a few international universities.

A few notable courses for data science are: a new machine learning course from the University of Washington, Linear Algebra from Brown, and Natural Language Processing by Michael Collins from Columbia.

See the following pages to seed what other courses are now available.

Learn Computer Science For Data Science

Many aspects of computer science are fundamental to data science. A good data scientist has to be able to transform/extract/manipulate lots of data. Computer programming is the main technique for such operations. Here are numerous resources to help you learn the fundamentals of computer science.

Online Computer Science Courses: Introductory Level

If you are not familiar with computer programming, this list is a good place to start.

Online Computer Science Courses: More Advanced

Two More Helpful Resources

Stack Overflow is a great site for answering all of your programming questions. It is good for beginners as well as more advanced programmers. Also, if you start writing a lot of code, Github is a great place to store that code.

More Free Courses from Stanford

Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.

If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes.  If you are new to computer science and programming, CS 101 would be a good choice.  If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.

Actually, the courses are no longer being offered by just Stanford.  A few others schools have been added.  The courses are now being offered through Coursera. Plus all the courses are free.