Links

An executive’s guide to machine learning | McKinsey & Company

via An executive’s guide to machine learning | McKinsey & Company.

A nice read if you are looking for a short introduction to the history and importance of machine learning.

Mapping youth well-being worldwide with open data – From DataKind

Mapping youth well-being worldwide with open data – From DataKind

Once again, I was honored to write a guest post for DataKind. This time is was on the spread of open source software by data-do-gooders. A couple years ago, DataKind hosted a DataDive in Washington D.C. and some of the participants created a mapping software project titled DataTools 2.0. Since then, it has been replicated by a number of groups around the globe. Read the full post on the DataKind blog to find out more.

A Guide for Doing Data Science for Good via DataLook

A Guide for Doing Data Science for Good via DataLook

The guide provides some excellent tips on how to get involved.

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means

This week; Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, and Todd Mytkowicz will be presenting Yinyang K-means at the 2015 International Conference on Machine Learning.

The algorithm guarantees the same results as traditional K-means, but it produces results with an order of magnitude higher performance.

An abstract of the paper and a PDF download can be accessed at Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.

Building an Analytics Team at 500px – Helpful Advice

Organizations everywhere are racing to build analytics/data science teams. Big Data is everywhere and companies don’t want to fall behind. Unfortunately, many organizations are struggling to get started because of questions similar to the following:

  1. How will Analytics help us?
  2. What does an analytics team look like in our organization?
  3. How do we start?

Luckily, the analytics team at 500px, a photography community site, was kind enough to provide a detailed overview, Building Analytics at 500px, of what really happens when building an analytics team. The overview provides:

  • Headaches
  • Infrastructure
  • Evangelism
  • And more

If your organization is considering adding an analytics or data science team, this article is definitely worth reading.

7 Tools for Data Visualization in R, Python, and Julia

7 Tools for Data Visualization in R, Python, and Julia

Model-based Machine Learning, Free Early Access Book

Model-based Machine Learning, Free Early Access Book

List of Over 200 Data Science College Programs

My previous list of Colleges with Data Science Degrees has grown very large, and numerous people have requested the ability to sort and/or filter. Thus, I built a new list. It is available at: Data Science Colleges. As far as I know, this is the most comprehensive list of data science programs available. Here are some of the features it offers:

  • Over 200 Programs
  • Certificate, Bachelors, Masters, and Doctorate programs included
  • Sort and Filter Programs
  • US and International
  • Program Name
  • Location
  • Online Programs
  • Ability to download the raw data as CSV or JSON

Yes, you read that last one correctly. All the data is freely available for you. If you do use the data for something, I would love to know and potentially blog about it.

The list will continue to evolve. If you find any broken links or missing programs, please leave a comment. Also, please leave a comment if you can think of ways to improve the list.

Data Scientist: Consider the Curriculum

A while back James Kobielus wrote the article, Data Scientist: Consider the Curriculum. It contains one of the best descriptions of a data science curriculum I have seen.  Also the article includes a list of algorithms/modeling techniques that should be known by a data scientist. Below is the list from the article.

  • linear algebra
  • basic statistics
  • linear and logistic regression
  • data mining
  • predictive modeling
  • cluster analysis
  • association rules
  • market basket analysis
  • decision trees
  • time-series analysis
  • forecasting
  • machine learning
  • Bayesian and Monte Carlo Statistics
  • matrix operations
  • sampling
  • text analytics
  • summarization
  • classification
  • primary components analysis
  • experimental design
  • unsupervised learning
  • constrained optimization

The list almost looks overwhelming.
Do you think anything is missing from the list?

Buffalo Bills to start advanced analytics department

Even the NFL is getting into data analysis these days.

Buffalo Bills to start advanced analytics department

Personal note: Like many American children, I grew up dreaming of playing professional football in the NFL. Also, like many American children, that dream did not come true. Maybe now I could try to make the NFL as a data scientist. I wonder if they have fall training camp for the analytics department. If so, sign me up.