Tag Archives: startup

Sense: Data Science Platform of the Future

swanstrom_sense_profile

I have been lucky enough to get early access to Sense, a new data science startup. They are building the “data science platform” of the future. They provide the ability to edit and run all your R, Python, and Javascript code right in your browser.

So far, I am extremely impressed. I love having the ability to do my data analysis in the browser of a chromebook. Here is just a partial list of the features:

  • A useful profile page for showing off current and completed work
  • Public and Private projects
  • Support for Python, R, and Javascript (more to come, there is an Engine API if you would like to add a language)
  • Support for private environment variables
  • Follow Other Projects
  • Collaboration with others on a data project
  • And more, and even more to come

I have built a quick sample that displays some of the features of Sense. Sense – Data Science Platform of the Future.

Note: A few weeks ago, I chatted with Tristan (one of the cofounders of Sense) and he assured me that big news and more features are coming soon. So, stay tuned!

Enigma Launches for Open Public Data

If you are looking for public data, Enigma.io is a new startup just for you. Enigma searches, finds, and connects a variety of formats of public data. The data is then linked and made accessible. Watch the video below for more details.

Quandl – A Search Engine for Datasets

I just found this site a couple days ago. Quandl is a new startup that is a search engine for datasets. The site really has a lot of data (over 2 million datasets). Plus the data can be sorted, filtered, graphed, combined, and finally downloaded in many different formats (Excel, JSON, R, csv, XML). Most of the data is numerical and/or time series.

If you have been looking for some data to explore, Quandl may be a good place to look.

10 R packages

Yhat, a new predictive modeling startup, wrote up a nice blog post about
10 R Packages I wish I knew about earlier. It is worth reading through the list.


Special Thanks to Mark Nickel for pointing out this link.

Easel.ly Launches For Creating Infographics

Easel.ly recently launched. It is a site for easily creating infographics. It looks pretty simple, but I am still not sure I have the artistic skills to make a good looking infographic.

Infographics are still great for telling the story of your data.

Startup Idea: A Search Engine For Recent News

The Problem

I have a problem. This is a problem that I would guess many other people have. I have access to way too much information. I want less, but I also want the best and newest.

Stack of Copy Paper

How do I find the best and newest information on any topic?  There is a lot of new information everyday.  I spend a lot of time searching the internet for quality information on data science. I would love to be able to visit a page and get the latest and greatest information on data science, statistics, bigdata, and machine learning.  I would be hoping to get news articles and/or blog posts from the last couple days.

Possible Solutions

Here is a list of products I have seen and why they are not exactly what I am looking for:

  • News.me – This site is close.  It emails a daily list of top articles, but the articles only come from my twitter followers.  The problem is: I may not be following the right people.  They did just do an excellent blog series about how people get news, so they may be working on something right now.
  • Paper.li – This seems very promising, but not all the content is new.  It also only updates 1 or 2 times per day.  The slow updates make it difficult to easily and quickly get the latest news.  After a few weeks of training your searches and parameters, this might be really good for getting news on a daily basis.  The big problem for me is; I have to wait until the next day to see if my search parameters are correct.  It is not built to handle ah-hoc queries.  Here is the paper I am working on, Data Science 101 News.
  • Storify – This is a way to create a collection of information from various social networks.  Storify makes it really easy to find social media mentions, but it is not automated and doesn’t save much time.  Not that it matters much to me, but the final product is not real pretty.
  • Summify – Recently bought by Twitter, so the future is uncertain.
  • TweetedTimes – Same problem as News.me.  It only generates information based upon people I follow on twitter.
  • Google News – This is good for searching, but I think a better one exists or needs to be created.  Plus, the Google Privacy issues are a concern here.

Better Solution

This whole concept of filtering/searching/rating news sounds like a data science problem itself.  For starters, given a topic, what information has been tweeted the most? What new information has spread the quickest? This approach could be expanded to include Facebook likes and Google +1’s.  Also there are numerous other API’s that could be included as well.  What I really want is a product that will do this in realtime (or near realtime).  I want to be able to enter my search terms and get a list of the most recent quality information pertaining to those terms. I guess what I want is a search engine for recent news (but not Google).

Does anyone know if a product like this exists or is anyone working to build a similar product?

Update: This idea also goes along with Paul Graham’s Ambitious Startup Idea #1 – A New Search Engine.