All posts by Ryan Swanstrom

Top 5 Data Science Blogs

  1. p-value.info – This blog is only about 1 month old, but it is filled with great stuff.  I just hope Carl , a data scientist at One Kings Lane, can keep up the good posts.
  2. Metamarkets Blog – Metamarkets is a startup focusing on data analytics for business users.  The blog contains lots of data science information.  During the summer, the blog ran an excellent series with data scientist interviews.
  3. Kaggle – A great startup with a great blog.  The blog has tips about data science competitions, explanations from winners, and various other data science related posts.
  4. iCrunchData – This is a job site for data-related positions.  That said, the blog is relevant and informative.  They even do data science on job postings for data science.
  5. What’s the Big Data – A frequently updated blog with great links to big data and data science resources. I especially like the “Big Data Quotes of the Week” posts.
Bonus Blogs
  1. Flowing Data – Nathan Dau, the blog’s author, is a PhD student at UCLA.  The blog focuses on visualizations.
  2. Columbia Data Science Course Blog – This was a blog to go along with the Data Science course at Columbia University.  Unfortunately, the blog will no longer be updated since the course is over.  However, it is still worth browsing though, since it covers many of the topics in data science.  It also has some great visualizations.

Top 5 Places to Get a Data Scientist job

  1. LinkedIn They turn data into products better than anyone else.
  2. Facebook If you are the type of person that loves to analyze people’s lives, there is no better place.
  3. Twitter Duh, It’s Twitter. lots of data and lots of possibilities
  4. Cloudera Cloudera is a successful Hadoop-based startup. Build tools and explore huge datasets for a variety of industries.
  5. Kaggle If optimizing algorithms and really diving into the data to get every last ounce of information is your thing, then Kaggle is it. Plus, there is nowhere else you will get to work on so many important problems in such a wide range of domains. Unfortunately, Kaggle is not currently hiring any data scientists, but they most likely will be seeking more in the future.

There are many other companies hiring data scientists. Where would you like to be a data scientist?

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

In 2013, Learn Data Science via Coursera (a curriculum)

Coursera has some excellent courses coming up in 2013. Here are some potential curriculum paths for someone looking to learn data science.

Prerequisites

Either sequence requires/recommends some basic programming experience. If you are unfamiliar with programming, you still have a couple weeks to get familiar with some basic programming concepts. Some good places to start would be either Coursera’s Computer Science 101 or Codecademy’s Python tutorial.

Data Science Curriculum #1

If you are new to programming, this would be the recommend sequence. The first course focuses on programming.

Course Start Date Completion Date
Computing for Data Analysis Jan. 2, 2013 Jan. 25, 2013
Data Analysis Jan. 22, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Data Science Curriculum #2

Course Start Date Completion Date
Computational Methods for Data Analysis Jan. 7, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Additional Courses

Neither of the Coursera machine learning (Stanford or U of Washington) courses are scheduled for 2013, but either of them would be a great (maybe necessary) follow up course. Hopefully, one of those courses will be starting in July or shortly there after.

After completing one of the above sequences combined with a machine learning course, a person should be skilled enough to begin doing useful data science work. (Note: A new job as a data scientist is not guaranteed, but the courses won’t hurt your chances.) Plus, Coursera offers numerous other classes that could be taken at a later time to increase depth in certain areas of data science (Natural Language Processing, Image Processing, and more).

Happy Learning in 2013!

If you are interested in more ways to learn data science, please check out Data Science 201, coming in 2013.

2 Recently Released Open Source Graph-Related Projects

  1. GraphBuilder

    Intel Labs built a tool for constructing mathematical graphs out of large datasets. It is Java based and works with Hadoop and MapReduce. Intel has release a whitepaper explaining more about GraphBuilder. The code is available on Github. A big thanks to Mark Nickel for pointing out this project.

  2. ArangoDB

    ArangoDB is a flexible NoSQL database. It is a document database with the ability to add edges. Thus it can become a graph database. I had a fun time playing around with the online tutorial and demo. ArangoDB also claims to support being a key/value store. The code is available on Github.

I think the information is interesting, but I also think the charts do a good job of telling the story.

The Bayesian Observer

The average age of first time mothers in the developed countries of the world has been rising for the last ~40 years.

Here is another plot that shows the rate of occurrence of Down Syndrome, a chromosomal defect, as a function of the age of the mother at the time of child birth.

The curve really starts to shoot up at 30. In the UK, the average age of a first time mother is 30 years. It is well known that the fertility rate in women decreases after the age of 30 and drops rapidly after 35. Older mothers are likely to find it harder to have a baby and if they do, then they run a higher risk of chromosomal defects. Given the possibilities of all these negative consequences, the increase in the average age is a bit disturbing. It seems like there is a hidden cost to more women…

View original post 232 more words