Data Scientist's Guide to Startups – Round Table Discussion

This discussion was held at the KDD 2013 Conference in Chicago. The title says it all. Here is the full discussion.

A Data Scientist’s Guide to Start-Ups

The following people were involved in the discussion.

Data Science Startup Ideas

  • Individualized Online Data Science Training – Create a series of training materials that can walk a student through different data science topics. The student should be able to go as slow or fast as possible, and it would be even better if mentors could be crowdsourced for help with projects. The training should cost a fee and mentors should be given a cut of that fee. Alternatively, the site could act as a recruitment site and a finder’s fee could be charged to companies that hire a student.
  • Build a Data Science Community – A site where users can share, evaluate, and collaborate on data analysis. I am hoping Sense, Domino Data Labs, or Mode becomes this community, but I think they are all focused on the infrastructure of doing data science, which is probably the important first step. Maybe the community could be built by calling into the APIs of those sites.
  • Random Kindness – Use data about peoples location and interests to encourage people to perform random acts of kindness for others around them. I realize this might have some privacy concerns.
  • Brand Tracking – Collect social activity about a brand and track its sentiment over time. The results could displayed geographically or by customer segment. Bonus points (and dollars), for being able to predict the factors that increase sentiment.
  • Online Marketplace for Data Science Consultants – Allow consultants to easily demonstrate their value and allow companies/individuals to easily find the right consultants. Allow the site to handle the transaction of money. I think Experfy is very close, but I would like it a bit more public. Possibly, allow consultants to build up a public portfolio.

What are your thoughts? Do you have any data science startup ideas or ideas you wish someone would build?

Big Data Startup Investments [Infographic]

This infographic is packed with good data. I especially enjoyed the section about big data startups that were acquired in 2013.

The Big Data Startups Investment Infographic
Data Elite: YCombinator for Data Science

Just launched, Data Elite is a new startup incubator for big data, analytics, and data science startups. Actually, here is the exact description from their website.

Data Elite is an institution that provides world-class early stage funding, top tier counseling and a home to aspiring Big Data start-ups.

The program will be highly selective (5 years of big data experience or a startup exit), but it will offer mentorship from some of the best minds in data science. The inaugural program will being January 15, 2014, with applications due December 15, 2013.

42 Big Data Startups – Vote for the Top 10

Startup50’s list of 42 Big Data Startups.

The voting the done, but the list contains plenty of startups working in the data science field.

Enigma Launches for Open Public Data

If you are looking for public data, Enigma.io is a new startup just for you. Enigma searches, finds, and connects a variety of formats of public data. The data is then linked and made accessible. Watch the video below for more details.

Online Textbook Publishing Platform?

About a week ago I posted a link to a free data mining textbook. Hacker News got wind of the book as well, and I am guessing a flood of traffic hit the textbook’s site. The flood happened to take the site completely down for a couple of days. It was a shame because the book is really good.

If you frequently read this blog, you will notice it has quite a number of links to free online textbooks. Each free online textbook is available a bit differently. Most are PDF downloads (either by chapter or the entire book) hosted at some person’s personal website or somewhere on a university’s website.

Here is my question. Does the web have a publishing platform for textbooks? Is there a startup working on something like this?

I am aware of wikibooks, but I just don’t hear much about the quality of the books. As a matter of fact, I just don’t hear much about wikibooks.

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

Startup Showcase – How did I do?

Yesterday, I made some predictions about the startups I thought would win at the Strata Startup Showcase. Here are the winners.

So how did I do? Well, I got one of the winners correct. I selected Placed. Hopefully videos of the demos will be available. If I find them, I will post some of them to the blog.

Data Startup Showcase

As part of New York City Big Data Week, a startup showcase is being offered. It will consist of 14 startups. Each startup will get to give a quick demo/presentation. Then Tim O’Reilly and Fred Wilson will select 3 winners. Also, numerous investors and journalists will be present. A complete list of the startups presenting is available on the Startup Showcase page.

Which ones do you think will win?

Without seeing any of the presentations, here are my 3 picks.
This might be my darkhorse pick, but I think InfoActive has my vote. I also like Placed for location analytics, and TempoDB is very intriquing as it is simply a time-series database.