Data Science For Trucking? Who Knew?

How To Manage One Million Vehicles With Big Data – Forbes.

The above link goes to a great story about using data science. What makes the story great is the company. It is not a science company or a tech startup. It is a truck management company. Data science is truly reaching all industries.

Data Without Borders is now DataKind

The Non-Profit Organization Data Without Borders has renamed itself to DataKind.

Here is the official announcement.

DataKind is an organisation that matches data from non-profit and government organisations with data scientists. DataKind hosts weekend DataDives and they are planning to build a DataCorps. See a previous post, Use Data Science to Help The World, to find out more about what DataKind is all about.

Big data: The next frontier for innovation, competition, and productivity | McKinsey & Company

Big Data: The next frontier for innovation, competition, and productivity | McKinsey Global Institute | Technology & Innovation | McKinsey & Company.

This report by McKinsey & Company is frequently referenced, so I thought I should post a link to it. It includes the following quote about the lack of talent to fill Big Data positions.

By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

This quote is why now is a great time to be learning to become a data scientist.

Twitter, NoSQL and Data Analysis

This is a lengthy but very good slide deck on the what/why of the tools used at Twitter.

Note: The slide deck is about 2 years old.

Hans Rosling and GapMinder

Hans Rosling, co-founder of GapMinder Foundation, provides a good Ted Talk about HIV in the world. He does an excellent job of using data to highlight countries(not continents) that have the most serious problems. He also states some reasons why HIV/AIDS is not dropping off as quickly in some rich countries.

Here is a second Ted Talk by Hans Rosling. This one is a bit more entertaining, but it still contains excellent use of data. Hint: He shows why the washing machine is so important.

Coursera is Expanding – New Courses Starting Today

Since recently announcing $16M in funding, Coursera has been making quite a bit of noise. Last fall, Stanford University decided to freely offer a couple computer science classes online. The response was huge, and that led to the creation of Coursera.

The courses are no longer limited to computer science, and Stanford is no longer the only school involved. Here is a list of academic areas being offered and another list with the schools involved.

Academic Areas

Universities Involved

Although, not all of the courses will be directly related to data science, many of them are very close. Naturally Math, Statistics, and Computer Science areas have direct relations to data science. However, some of the other areas such as Networks, Biology, and Economics are some of the most popular application areas for data science. This is very exciting. My only concern is that the courses are a bit too much like traditional university courses with specific start/end dates and homework due dates. It will be interesting to see if the course structures change over time.

Anyhow, the following courses are starting today. Signup and start learning.

  • Machine Learning – A major focus area of data science
  • Computer Science 101 – probably a good starting point if you don’t know how to program
  • Compilers – good for understanding how programming languages work
  • Automata – hard to explain in 1 line, but it contains some fundamental principles in computer science
  • Intro to Logic – learn to reason systematically
  • Computer Vision – not sure of the relation to data science, but I am sure there is one, if you know, please leave a comment

Are you going to enroll in any of these courses?

It's Big Data Week

It’s Big Data Week

Events are happening across the globe.

Machine Learning: Algorithms that Produce Clusters | Architects Zone

Machine Learning: Algorithms that Produce Clusters | Architects Zone.

The above article provides a nice brief overview of 5 clustering algorithms.

  1. K-Means
  2. Hierarchical Clustering
  3. Fuzzy C-Means
  4. Multi-Gaussian with Expectation-Maximization
  5. Density-based Cluster

This goes well with a previous post about 6 Machine Learning Algorithms.

Data Scientists: The New Rock Stars of the Tech World

Data Scientists: The New Rock Stars of the Tech World.

Troy Sadkowsky of DataScientists.net conducted a very nice interview with Jake Porway of the NY Times R&D Lab and Data Without Borders. Here are some of the questions that got answered.

  • Why Data Scientists are Tech’s Rock Stars?
  • What does a Data Scientist do?
  • How to become a Data Scientist?

Books Can Tell More Than One Story

This is a very entertaining Ted Talk about how what books can tell us over time. Just watch the video.