5 Free Programming Languages for Data Science

  1. R There is a package for nearly any algorithm you will ever need. That is where R really excels. It is widely used and has a strong community. The only slight downfall (in my opinion) is the cumbersome syntax.
  2. Python A very good language for beginning programmers. The syntax is quite readable and intuitive. With the NumPy and SciPy packages, python has many of the tools/algorithms necessary to do data science.
  3. Octave Octave was created to be very similar to the commercial product, Matlab. Octave is used and highly recommended in Dr. Andrew Ng’s Coursera machine learning course.
  4. Java While I don’t read a lot about people using Java for quickly testing new statistical models, a couple of the larger open-source data science products are built with Java, Hadoop and Storm to name a couple. Plus, Java does have libraries for just about everything, and it has proved itself to be a fairly descent production environment.
  5. Julia This is the newcomer on the list. Julia claims to have really great performance along with built-in support for parallelism and cloud computing. I am not too familiar with Julia, but it will be interesting to see how the Julia community grows over the coming months and years. Julia is currently lacking some of the libraries/algorithms that the others on the list support.

Top 5 Data Science Guys

  1. DJ Patil DJ is one of the most recognizable faces in data science. He is constantly speaking at conference and appearing in the news articles. He is currently Data Scientist in Residence at Greylock Partners. Also, he helped create the term data scientist.
  2. Jeff Hammerbacher Jeff helped build the first data science team at Facebook. Then he moved on to help co-found Cloudera. Now he is involved with helping a Medical School perform better data analysis. Along with DJ, he helped to coin the term data scientist.
  3. Drew Conway A PhD student at NYU and Scientist at IA Ventures, Drew is very active in the data science world. He speaks at conferences, co-authored a book on machine learning, creates Venn Diagrams, and more.
  4. Jake Porway Jake is a former data scientist with the New York Times. He now spends a lot of time speaking about DataKind, a nonprofit organization (he founded) attempting to help the world.
  5. David Smith David is a blogger for Revolution Analytics. He is also a big fan of R. David is working very hard to help others learn about data science by speaking at conferences and hosting webinars.

This post goes along with Top 5 Data Science Gals.

Top 5 Data Science Gals

  1. Hilary Mason Hilary is the Chief Scientist at Bitly. She is a frequent speaker at conferences. She is commonly cited, interviewed and referenced in data science news/blogs/articles.
  2. Cathy O’Neil Cathy is better known to the internet world as mathbabe. She is a blogger (although not strictly about just data science), conference speaker, and soon to be book author.
  3. Carla Gentry Founder of Analytical-Solution.com, Carla is one of the most frequent #datascience tweeters on Twitter. She is known to the twitter world as @data_nerd
  4. Monica Rogati Monica is a Senior Data Scientist at LinkedIn. She speaks at conferences, publishes academic papers, tweets, and creates great data products at LinkedIn. She likes data so much, she uses data for parenting.
  5. Rachel Schutt Rachel just recently completed teaching and blogging the Introduction to Data Science course at Columbia University. She is also a Senior Statistician at Google Research. Along with Cathy, she will be a book author.

This post goes along with Top 5 Data Science Guys.

Top 5 MOOCs for Data Science

Course Organization Notes
Machine Learning Coursera (Standford) One of the first MOOCs
Intro to Data Science Coursera (U of Washington) Starts in April 2013
Intro to Statistics
Making Decisions Based on Data
Udacity Enroll anytime
Introduction to Infographics and Data Visualization Knight Center @ U of Texas Starts January 12, 2013
Learning From Data CalTech Starts Jan 8, 2013

Good List!


Over the past year, I’ve seen a lot of startups, projects and tools that aim to bring fairly advanced analytic capabilities to programmers. Sometimes they do this by enabling simple scripts that result in powerful dashboards or processes, while other times they just deliver the data in an easy-to-consume manner with little work at all on the developer’s part. I think this is a meaningful trend.

In a world of mobile apps and cloud resources, it’s easier than ever to start a business around a simple application. Even in large companies, developers fighting for resources might need to prove an application’s popularity or find a way to boost its monetization. Sometimes, that might even mean injecting some data-processing right into an application.

But whatever the case, if your job revolves around writing code rather than data flows, you might need a little help. Here are 12 tools (listed alphabetically) that…

View original post 939 more words

Top 5 Data Science Blogs

  1. p-value.info – This blog is only about 1 month old, but it is filled with great stuff.  I just hope Carl , a data scientist at One Kings Lane, can keep up the good posts.
  2. Metamarkets Blog – Metamarkets is a startup focusing on data analytics for business users.  The blog contains lots of data science information.  During the summer, the blog ran an excellent series with data scientist interviews.
  3. Kaggle – A great startup with a great blog.  The blog has tips about data science competitions, explanations from winners, and various other data science related posts.
  4. iCrunchData – This is a job site for data-related positions.  That said, the blog is relevant and informative.  They even do data science on job postings for data science.
  5. What’s the Big Data – A frequently updated blog with great links to big data and data science resources. I especially like the “Big Data Quotes of the Week” posts.
Bonus Blogs
  1. Flowing Data – Nathan Dau, the blog’s author, is a PhD student at UCLA.  The blog focuses on visualizations.
  2. Columbia Data Science Course Blog – This was a blog to go along with the Data Science course at Columbia University.  Unfortunately, the blog will no longer be updated since the course is over.  However, it is still worth browsing though, since it covers many of the topics in data science.  It also has some great visualizations.