An enlightening video about how a single pixel change can fool a neural network.
Brandon Rohrer (along with others) created an excellent resource for academic programs, Industry recommendations for academic data science programs. The resource is authored by a number of industry data scientists and university faculty. It is collection of useful information for college data science programs. Here are some of the topics:
- What do Industry data scientists do?
- What makes someone a good data scientist?
- How can universities partner with companies?
- and others
Plus, the site is growing, and new information is frequently being added. If your college/university is launching a data science program, this resource is a must read.
If you work at a university and are considering starting an undergraduate program in data science, then today’s post is for you.
- A Guide to Teaching Data Science  – focuses on increasing 3 skills (create, connect, compute) within Statistics departments to develop data science
- Teaching the Foundations of Data Science: An Interdisciplinary Approach  – A study and analysis of teaching an introductory course on data science with a cooperation between MIS and CS.
- A Data Science Course for Undergraduates: Thinking with Data  – Overview of an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data science. very detailed many great topics plus R and SQL
- Embracing Data Science  – Statistics needs to learn from data science to make courses more relevant
If you know of any other papers, please leave a comment below.
- A timeline of Deep Learning papers (with download links) written since 2011
- A large collection of Deep Learning papers broken out by specific topic. It also includes ratings.
- A list of papers to compliment Deep Learning Book
The last links are not official academic papers, but they are quite good resources on deep learning.
Pedro Domingos of the Department of Computer Science and Engineering at the University of Washington provides a very useful paper with tips for machine learning. The paper is title, A Few Useful Things to Know about Machine Learning [pdf].
Below are the 12 useful tips.
- LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION
- IT’S GENERALIZATION THAT COUNTS
- DATA ALONE IS NOT ENOUGH
- OVERFITTING HAS MANY FACES
- INTUITION FAILS IN HIGH DIMENSIONS
- THEORETICAL GUARANTEES ARE NOT WHAT THEY SEEM
- FEATURE ENGINEERING IS THE KEY
- MORE DATA BEATS A CLEVERER ALGORITHM
- LEARN MANY MODELS, NOT JUST ONE
- SIMPLICITY DOES NOT IMPLY ACCURACY
- REPRESENTABLE DOES NOT IMPLY LEARNABLE
- CORRELATION DOES NOT IMPLY CAUSATION
For details and a good explanation of each, see the paper A Few Useful Things to Know about Machine Learning [pdf].
Also,later this year, Pedro Domingos will be teaching a machine learning course via Coursera. Sign up if you are interested.
Although Tobias Mayer may be known as the first data scientist, he did not coin the term data science. According to Wikipedia, the first use of the term data science was in 2001.
Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics was published in the April 2001 edition of the International Statistics Review. The author was William S. Cleveland, currently a Professor of Statistics at Purdue University.
The paper proposes a new field of study named data science. It then goes on to list and explain 6 technical focus areas for a university data science department.
- Multidisciplinary Investigations
- Models and Methods for Data
- Computing with Data
- Tool Evaluation
For the most part, the paper is still relevant. I did find a couple of good quotes from the paper that deserve comment.
The primary agents for change should be university departments themselves.
That did not happen. The driving agents for change in the data science field have been some of the newer technology/web companies such as LinkedIn, Twitter, and Facebook (none of which even existed in 2001).
…knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of the knowledge bases would produce a powerful force for innovation.
I think this statement still applies today. The world is just starting to realize the benefits of merging knowledge from computer science and statistics. There is much more work to do. Fortunately, businesses and universities are working to address the merger.
Have you seen the paper before? What are your thoughts on it?