Google has recently released a Jupyter Notebook platform called Google Colaboratory. You can run Python code in a browser, share results, and save your code for later. It currently does not support R code.
A nice read if you are looking for a short introduction to the history and importance of machine learning.
Once again, I was honored to write a guest post for DataKind. This time is was on the spread of open source software by data-do-gooders. A couple years ago, DataKind hosted a DataDive in Washington D.C. and some of the participants created a mapping software project titled DataTools 2.0. Since then, it has been replicated by a number of groups around the globe. Read the full post on the DataKind blog to find out more.
The guide provides some excellent tips on how to get involved.
The algorithm guarantees the same results as traditional K-means, but it produces results with an order of magnitude higher performance.
An abstract of the paper and a PDF download can be accessed at Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
Organizations everywhere are racing to build analytics/data science teams. Big Data is everywhere and companies don’t want to fall behind. Unfortunately, many organizations are struggling to get started because of questions similar to the following:
- How will Analytics help us?
- What does an analytics team look like in our organization?
- How do we start?
Luckily, the analytics team at 500px, a photography community site, was kind enough to provide a detailed overview, Building Analytics at 500px, of what really happens when building an analytics team. The overview provides:
- And more
If your organization is considering adding an analytics or data science team, this article is definitely worth reading.
My previous list of Colleges with Data Science Degrees has grown very large, and numerous people have requested the ability to sort and/or filter. Thus, I built a new list. It is available at: Data Science Colleges. As far as I know, this is the most comprehensive list of data science programs available. Here are some of the features it offers:
- Over 200 Programs
- Certificate, Bachelors, Masters, and Doctorate programs included
- Sort and Filter Programs
- US and International
- Program Name
- Online Programs
- Ability to download the raw data as CSV or JSON
Yes, you read that last one correctly. All the data is freely available for you. If you do use the data for something, I would love to know and potentially blog about it.
The list will continue to evolve. If you find any broken links or missing programs, please leave a comment. Also, please leave a comment if you can think of ways to improve the list.
A while back James Kobielus wrote the article, Data Scientist: Consider the Curriculum. It contains one of the best descriptions of a data science curriculum I have seen. Also the article includes a list of algorithms/modeling techniques that should be known by a data scientist. Below is the list from the article.
- linear algebra
- basic statistics
- linear and logistic regression
- data mining
- predictive modeling
- cluster analysis
- association rules
- market basket analysis
- decision trees
- time-series analysis
- machine learning
- Bayesian and Monte Carlo Statistics
- matrix operations
- text analytics
- primary components analysis
- experimental design
- unsupervised learning
- constrained optimization
The list almost looks overwhelming.
Do you think anything is missing from the list?