Tag Archives: recommender systems

More Recommender Systems Resources

Last week I posted about Coursera’s Introduction to Recommender Systems course. Well, I believe it is the first MOOC on the topic, but there are other material available online.

If you know of any other great recommender systems resources online, please feel free to share them.

Coursera Class on Recommender Systems

In about 1 month, the course, Introduction to Recommender Systems, will begin on Coursera. The course is being offered by the Computer Science and Engineering Department from the University of Minnesota.

The course is 14 weeks long and has 2 tracks:

  1. Programming Track – 6 different recommender systems will be programmed
  2. Concept Track – great for people that want to know about recommender systems, but don’t want program

Recommender systems are an important part of data science, and this course looks to provide an excellent in-depth overview of the topic.

What is Maching Learning

Machine Learning is a term that can mean different things to different people. Andrew Ng, cofounder of Coursera and Professor at Stanford, provides two definitions in his popular Machine Learning Course. The first definition comes from Arthur Samuel around 1959.

Field of study that gives computers the ability to learn without being explicitly programmed.

The second definition comes from Tom Mitchell’s 1997 Machine Learning textbook. This definition is a bit more formal and rigorous. This book defines a well-posed learning problem as:

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine Learning Categories

Machine learning can be broken down into a few categories. The two most popular are supervised and unsupervised learning. A couple other categories are recommender systems and reinforcement learning.

Supervised Learning

Probably the most common category of machine learning, supervised learning is concerned with fitting a model to labeled data. Labeled data is data that has the correct answer supplied. Regression and Classification are the most common types of problems in supervised learning.

Unsupervised Learning

Unsupervised learning deals with unlabeled data. Therefore, the goal of unsupervised learning is to find structure in the data. Clustering is probably the most common technique.

Others

Recommender systems deal with making recommendations based upon previously collected data. Reinforcement learning is concerned with maximizing the reward of a given agent(person, business, etc).

Learn More

Most of the above information comes from the Coursera Machine Learning Course. There is still time to sign up since the first assignments are not due until the end of the week.

Programmer's Guide to Data Mining – A free ebook

Ron Zacharski is currently writing a data mining book, A Programmer’s Guide to Data Mining. The book is targeted at programmers that want to know when and how to apply recommendation engines and other data mining techniques. The book is still in the writing phase, but I can say the first couple chapters are excellent. The book will always be available for free download.

If you are a programmer that is looking to add some recommendations to a website, I would highly suggest taking a look at this book.

Free Textbook: Mining of Massive Datasets

A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.

  • data mining
  • map-reduce
  • clustering
  • recommender systems
  • and more

The book is free for download, or available from Cambridge University Press.

Stanford Machine Learning Class – What is covered

A few days ago, I mentioned that the Stanford Machine Learning class will be starting soon.  I thought I should quickly mention some of the topics covered.  The list also serves as a great outline for machine learning.

Supervised Learning

In supervised learning, one has a set of data with features and labels.

  • Linear Regression – one/multiple variables
  • Gradient Descent – a general algorithm for minimizing a function
  • Logistic Regression – This is useful when predicting classification type results.  For example, are you looking for a yes or no result.  Does the patient have cancer?  Will the customer buy my new product?  It can also be helpful for more than 2 results.  What color will a person choose (red, blue, green, silver)?
  • Neural Networks – A learning algorithm that is modeled after the brain.  Think of neurons.
  • Support Vector Machines

Unsupervised Learning

In unsupervised learning, one has a set of data with no features and labels.  Can some structure be found for the data?

  • Clustering – The most popular technique is K-means.
  • PCA (Principal Components Analysis) – speed up a learning algorithm

Anomaly Detection

This section covers methods to determine if data is bad.  Bad data is considered an anomaly.

Recommender Systems

Like the name says, recommender systems are used to make recommendations.  Companies like Netflix use recommender systems to recommend new movies to customers.  LinkedIn also recommends people to connect with.  This is a fairly hot topic in the tech world right now.

  • Content Based(Features)
    • Modified Linear Regression
  • Non-content Based(No Features)
    • Collaborative Filtering
    • Matrix Factorization

If any of these topics sound interesting to you, signup for the Stanford Machine Learning class.  Professor Andrew Ng will do an excellent job explaining the details.