5 Free Programming Languages for Data Science

  1. R There is a package for nearly any algorithm you will ever need. That is where R really excels. It is widely used and has a strong community. The only slight downfall (in my opinion) is the cumbersome syntax.
  2. Python A very good language for beginning programmers. The syntax is quite readable and intuitive. With the NumPy and SciPy packages, python has many of the tools/algorithms necessary to do data science.
  3. Octave Octave was created to be very similar to the commercial product, Matlab. Octave is used and highly recommended in Dr. Andrew Ng’s Coursera machine learning course.
  4. Java While I don’t read a lot about people using Java for quickly testing new statistical models, a couple of the larger open-source data science products are built with Java, Hadoop and Storm to name a couple. Plus, Java does have libraries for just about everything, and it has proved itself to be a fairly descent production environment.
  5. Julia This is the newcomer on the list. Julia claims to have really great performance along with built-in support for parallelism and cloud computing. I am not too familiar with Julia, but it will be interesting to see how the Julia community grows over the coming months and years. Julia is currently lacking some of the libraries/algorithms that the others on the list support.

Top 5 Data Science Guys

  1. DJ Patil DJ is one of the most recognizable faces in data science. He is constantly speaking at conference and appearing in the news articles. He is currently Data Scientist in Residence at Greylock Partners. Also, he helped create the term data scientist.
  2. Jeff Hammerbacher Jeff helped build the first data science team at Facebook. Then he moved on to help co-found Cloudera. Now he is involved with helping a Medical School perform better data analysis. Along with DJ, he helped to coin the term data scientist.
  3. Drew Conway A PhD student at NYU and Scientist at IA Ventures, Drew is very active in the data science world. He speaks at conferences, co-authored a book on machine learning, creates Venn Diagrams, and more.
  4. Jake Porway Jake is a former data scientist with the New York Times. He now spends a lot of time speaking about DataKind, a nonprofit organization (he founded) attempting to help the world.
  5. David Smith David is a blogger for Revolution Analytics. He is also a big fan of R. David is working very hard to help others learn about data science by speaking at conferences and hosting webinars.

This post goes along with Top 5 Data Science Gals.

Top 5 Data Science Gals

  1. Hilary Mason Hilary is the Chief Scientist at Bitly. She is a frequent speaker at conferences. She is commonly cited, interviewed and referenced in data science news/blogs/articles.
  2. Cathy O’Neil Cathy is better known to the internet world as mathbabe. She is a blogger (although not strictly about just data science), conference speaker, and soon to be book author.
  3. Carla Gentry Founder of Analytical-Solution.com, Carla is one of the most frequent #datascience tweeters on Twitter. She is known to the twitter world as @data_nerd
  4. Monica Rogati Monica is a Senior Data Scientist at LinkedIn. She speaks at conferences, publishes academic papers, tweets, and creates great data products at LinkedIn. She likes data so much, she uses data for parenting.
  5. Rachel Schutt Rachel just recently completed teaching and blogging the Introduction to Data Science course at Columbia University. She is also a Senior Statistician at Google Research. Along with Cathy, she will be a book author.

This post goes along with Top 5 Data Science Guys.

Top 5 MOOCs for Data Science

Course Organization Notes
Machine Learning Coursera (Standford) One of the first MOOCs
Intro to Data Science Coursera (U of Washington) Starts in April 2013
Intro to Statistics
Making Decisions Based on Data
Udacity Enroll anytime
Introduction to Infographics and Data Visualization Knight Center @ U of Texas Starts January 12, 2013
Learning From Data CalTech Starts Jan 8, 2013

Top 5 Data Science Blogs

  1. p-value.info – This blog is only about 1 month old, but it is filled with great stuff.  I just hope Carl , a data scientist at One Kings Lane, can keep up the good posts.
  2. Metamarkets Blog – Metamarkets is a startup focusing on data analytics for business users.  The blog contains lots of data science information.  During the summer, the blog ran an excellent series with data scientist interviews.
  3. Kaggle – A great startup with a great blog.  The blog has tips about data science competitions, explanations from winners, and various other data science related posts.
  4. iCrunchData – This is a job site for data-related positions.  That said, the blog is relevant and informative.  They even do data science on job postings for data science.
  5. What’s the Big Data – A frequently updated blog with great links to big data and data science resources. I especially like the “Big Data Quotes of the Week” posts.
Bonus Blogs
  1. Flowing Data – Nathan Dau, the blog’s author, is a PhD student at UCLA.  The blog focuses on visualizations.
  2. Columbia Data Science Course Blog – This was a blog to go along with the Data Science course at Columbia University.  Unfortunately, the blog will no longer be updated since the course is over.  However, it is still worth browsing though, since it covers many of the topics in data science.  It also has some great visualizations.

Top 5 Places to Get a Data Scientist job

  1. LinkedIn They turn data into products better than anyone else.
  2. Facebook If you are the type of person that loves to analyze people’s lives, there is no better place.
  3. Twitter Duh, It’s Twitter. lots of data and lots of possibilities
  4. Cloudera Cloudera is a successful Hadoop-based startup. Build tools and explore huge datasets for a variety of industries.
  5. Kaggle If optimizing algorithms and really diving into the data to get every last ounce of information is your thing, then Kaggle is it. Plus, there is nowhere else you will get to work on so many important problems in such a wide range of domains. Unfortunately, Kaggle is not currently hiring any data scientists, but they most likely will be seeking more in the future.

There are many other companies hiring data scientists. Where would you like to be a data scientist?

Top 5 Data Startups

  1. Kaggle They make data science a sport, enough said.
  2. DataKind DataKind may not technically be a startup because it is a nonprofit, but they are doing cool stuff.  They match nonprofit organizations with people that love to analyze data and create visualizations.
  3. Cloudera They call themselves “The Platform for Big Data”.  They are working hard to make hadoop easier to use.
  4. Coursera  Coursera is an education startup, but with 2 Computer Science Professors as founders, you can bet they are crunching a lot of data about how people learn.
  5. BigML They are trying to make machine learning available to everyone.  Machine Learning as a Service!

In 2013, Learn Data Science via Coursera (a curriculum)

Coursera has some excellent courses coming up in 2013. Here are some potential curriculum paths for someone looking to learn data science.

Prerequisites

Either sequence requires/recommends some basic programming experience. If you are unfamiliar with programming, you still have a couple weeks to get familiar with some basic programming concepts. Some good places to start would be either Coursera’s Computer Science 101 or Codecademy’s Python tutorial.

Data Science Curriculum #1

If you are new to programming, this would be the recommend sequence. The first course focuses on programming.

Course Start Date Completion Date
Computing for Data Analysis Jan. 2, 2013 Jan. 25, 2013
Data Analysis Jan. 22, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Data Science Curriculum #2

Course Start Date Completion Date
Computational Methods for Data Analysis Jan. 7, 2013 Mar. 15, 2013
Introduction to Data Science April 2013 June 2013

Additional Courses

Neither of the Coursera machine learning (Stanford or U of Washington) courses are scheduled for 2013, but either of them would be a great (maybe necessary) follow up course. Hopefully, one of those courses will be starting in July or shortly there after.

After completing one of the above sequences combined with a machine learning course, a person should be skilled enough to begin doing useful data science work. (Note: A new job as a data scientist is not guaranteed, but the courses won’t hurt your chances.) Plus, Coursera offers numerous other classes that could be taken at a later time to increase depth in certain areas of data science (Natural Language Processing, Image Processing, and more).

Happy Learning in 2013!

If you are interested in more ways to learn data science, please check out Data Science 201, coming in 2013.