Plot.ly a new online Graph Tool

Plot.ly is a new site that allows for web-based plotting of graphs. The site allows a user to upload data, create a number of plots, and even write python code to generate custom graphs. Then the site has numerous export options for the graphs as well as options for sharing the graph via socia networks.

Below is an example graph via a sharable image link.

I have not had a lot of time to play around with the site, but it looks very impressive. I think there are a lot of possibilities for Plot.ly. First, I could see it used for data analysis in the cloud. Also, I could see it used for sharing plots between researchers or for publishing extra graphs to go along with publications.

Can you think of some other uses for Plot.ly?

Columbia Data Science Certificate Program

The Institute for Data Science and Engineering at Columbia University has released their first academic offering. It is a certificate program titled, Certification of Professional Achievement in Data Sciences. The certificate program consists of 4 courses:

  1. Algorithms for Data Science
  2. Probability & Statistics
  3. Machine Learning for Data Science
  4. Exploratory Data Analysis and Visualization

Columbia is currently accepting applications for the Fall of 2013. Unfortunately, the program will not initially be offered online.

Also, Columbia is planning to start a new master’s degree in data science sometime in 2014. A PhD program is supposed to come sometime after that. Some of the future programs will also be available online. Combined with the data science program at NYU, New York City is becoming a premiere academic location for learning data science.

Open Data Festival

Launching in the autumn of 2013, Open Data Festival will be hosting a global data festival. The details are quite vague at this point, but they are looking for volunteers, cities, and speakers. Feel free to sign up.

The festival is being organized by the same team that organizes Big Data Week.

Gamification Data Science Video

I thought this was a fun little video about gamification and data science, plus my 2 year-old was mesmerized by the video. It is worth 3 minutes to watch.

What is Maching Learning

Machine Learning is a term that can mean different things to different people. Andrew Ng, cofounder of Coursera and Professor at Stanford, provides two definitions in his popular Machine Learning Course. The first definition comes from Arthur Samuel around 1959.

Field of study that gives computers the ability to learn without being explicitly programmed.

The second definition comes from Tom Mitchell’s 1997 Machine Learning textbook. This definition is a bit more formal and rigorous. This book defines a well-posed learning problem as:

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine Learning Categories

Machine learning can be broken down into a few categories. The two most popular are supervised and unsupervised learning. A couple other categories are recommender systems and reinforcement learning.

Supervised Learning

Probably the most common category of machine learning, supervised learning is concerned with fitting a model to labeled data. Labeled data is data that has the correct answer supplied. Regression and Classification are the most common types of problems in supervised learning.

Unsupervised Learning

Unsupervised learning deals with unlabeled data. Therefore, the goal of unsupervised learning is to find structure in the data. Clustering is probably the most common technique.

Others

Recommender systems deal with making recommendations based upon previously collected data. Reinforcement learning is concerned with maximizing the reward of a given agent(person, business, etc).

Learn More

Most of the above information comes from the Coursera Machine Learning Course. There is still time to sign up since the first assignments are not due until the end of the week.

Coursera Data Science Begins

The highly anticipated Coursera class, Introduction to Data Science, started yesterday. It looks good so far. Why not join 72,000 other students interested in learning data science?