Dryad: Scientific Data Repository

Earlier, I posted about Scientific Data, but unfortunately the site does not host any of the data.

Enter Dryad, data hosting is exactly what the site does. The site hosts opendata and any other digital artifacts associated with a research project. Plus the site provides a DOI (Digital Object Identifier) for citing the the artifacts in research papers.

Scientific Data: A new publisher of Data

Nature.com is starting a new publication titled, Scientific Data. The goal is to help researchers publish and discover data. The publication content is called a Data Descriptor. It describes the data, explains the data collection methods, lists the columns, and states other essential information about the dataset.

Unfortunately, the site does not host any of the data. I think it will be interesting to watch how a site like this develops. The publication is currently accepting submissions.

Strata + Hadoop World 2013 Live Stream

The 2013 Strata + Hadoop World Conference is currently going on in New York City. Many of the keynotes will have video streaming live on Tuesday and Wednesday (Oct 29 and 30). Watch and enjoy. If for some reason you cannot watch via the livestream, most of the keynotes are usually placed on Youtube shortly after the conference.

Top 10 Ways to Know You are a Data Scientist

For some humor on this Friday, here is the Top 10 Ways You Know You’re a Data Scientist by Fico Labs.

I would add number 11:

You think the list is funny.

Data Gotham 2013 Videos

In case you do not live in New York City or you did not attend Data Gotham, do not worry because nearly all the videos and talks are posted on the Data Gotham 2013 Youtube page.

Below is a panel discussion on data and privacy.

Some Numbers On the Launch of Listudy

After launching Listudy a couple days ago, I have 2 days worth of Google Analytics to show.  I do not know exactly how to interpret these numbers, but I will do my best to provide some insights below.


  • 5884 pageviews – That number is not huge, so the site did not explode and go viral.  However, the site did get viewed.
  • Almost 1000 unique visitors – sounds good
  • 4.58 pages per visit – so people clicked around, I thought this was pretty good considering the minimal functionality of the site.
  • 4 minutes 13 seconds average time on site – I was most impressed with this number, or maybe the site is just that slow.
  • I launched 9 months too late – I essentially had this version of the site built 9 months ago, but I kept thinking it was the wrong idea or people wouldn’t like it or I should build something else.  Yes, the site is no where near finished, but it is at least a Minimum Viable Product (MVP).
  • Here is the question I was hoping to answer with an MVP. Do people care to see a list of links on a highly specific topic?  I am not sure if I have enough data to answer this question yet, but I am leaning towards yes.

Why share these numbers?  Why not and how many people share these numbers unless the site was a huge success?   I have not launched enough websites to know how these numbers compare to other sites, but I am also not sure how launch day traffic correlates to eventual website success.  Maybe I need to launch a bunch more websites so I can perform some data science on website launch data.

Columbia Intro to Data Science 2.0

Once again this fall, Dr. Rachel Schutt will be teaching Introduction to Data Science at Columbia University. Dr. Kayur Patel, a computer scientist at Google, will also be aiding in the teaching of the course. Like last year, a blog will be maintained for the course. The blog is worth following as it contains tons of great data science information. For example, here is the definition of data science used for the course.

study of the space of problems that can be solved with data

The course has already started this fall, so you have a few posts to read in order to catch up.

Data Elite: YCombinator for Data Science

Just launched, Data Elite is a new startup incubator for big data, analytics, and data science startups. Actually, here is the exact description from their website.

Data Elite is an institution that provides world-class early stage funding, top tier counseling and a home to aspiring Big Data start-ups.

The program will be highly selective (5 years of big data experience or a startup exit), but it will offer mentorship from some of the best minds in data science. The inaugural program will being January 15, 2014, with applications due December 15, 2013.

Data Science 201 is now Listudy

The site is live at http://www.listudy.com

The internet is filled with wonderful tutorials, blog posts, courses, and articles about learning data science. Listudy aims to be a place to organize and reference those learning resources. The site is in the very early stages, so please be patient and stay tuned for future updates. Also, follow along on twitter @listudy or facebook.com/listudycom.

Right now, Listudy is a large collection of data science links that can be filtered and sorted by tags.

Note: The idea for Data Science 201/Listudy nearly died, but I have been persuaded to launch what I have and see if people find it useful. Please leave a comment or shoot me an email ( ryan at listudy.com) if you have some feedback or find the site useful.

Data Science Webinar Reminder

Don’t forget about the Kalido Data Science Webinar taking place tomorrow.