Blog How do recommendation systems work?

An introduction to the most popular recommendation system algorithms

Recommendation systems are data filtering tools that use algorithms and data to recommend the most relevant items to a particular user. As modern consumers, we are inundated with a wide variety of products. Thus, retailers need to invest in an accurate recommendation system to match our needs with suitable products from online platforms.

In this article, you will learn about:

  1. Basic concepts of recommendation systems
  2. Collaborative filtering method
  3. K-nearest neighbour algorithm
  4. The curse of dimensionality
  5. Drawbacks of collaborative filtering

Authors: Amirali Yazdi & Sonja Noben

What is the basic concept behind recommendation systems?

We can use a straightforward example with a very basic approach explaining the recommendation system’s basic concept.

Imagine there is an online streaming platform with only five movies and three users. Bill and Ted watched all the movies on the platform and rated them with stars (like) or red crosses (dislike). However, Jessica still hasn’t watched two of the films available on the platform, and we want to design a recommendation system to find out which one of these films Jessica prefers to watch. The system will then recommend that film to her.

How do recommendation systems work?
Movie user rating list

By going through previous user ratings, we can extract some useful information for our recommender. We can observe that Jessica and Bill both like drama movies from the director Asghar Farhadi and both dislike the superhero movie Avengers: Endgame. On the other hand, Ted is a big fan of Iron Man and Captain America and the Avengers films. Based on this simple observation from our user rating table, we can conclude that Jessica and Bill have more similar tastes in movies.

How do recommendation systems work?

Therefore, from the two available films, The Avengers and Todo lo Saben (another movie by Asghar Farhadi) that Jessica hasn’t watched yet, we choose the one that Bill liked and recommend that to Jessica. Now, let’s put that into simple math.

Note that the total number of movies is set to three instead of six because there are only three movies that all the users rated. This method of designing a recommendation system is categorized as a user-based Collaborative Filtering method.

What is a collaborative filtering method?

Collaborative filtering (CF) is a recommendation system method based on the past interactions between users and items and the history of ratings given by different users to each item. The most common approach to prepare data for the CF method is to create a user-item interaction table as below:

How do recommendation systems work?
User-item interaction table

There are two main approaches for Collaborative Filtering (CF) methods.

  1. Model-based: trains a model by learning the user-item representations from the user-item interaction matrix.
  2. Memory-based (no model): relies on the similarities of users or items. In this article, we focus on the Memory-based CF, which itself comes in two different forms.

When we want to recommend a movie, book, or any product to a user, the most logical approach is to find users with similar interests (highest similarity score) and recommend the items that similar users rated the highest to our user. We can consider our simple movie-recommender example above as a user-based CF. Alternatively, the recommendation system considers the products that the user bought or items that users rated as five-star as input and recommends items similar to those. Hence, calculating the similarity score is essential for both user-based and item-based forms of memory-based collaborative filtering.

Calculating the similarity score in real life is more complicated than our simple example above due to the vast number of products (items) in every online platform and the data’s structural diversity. In our example, we have Boolean data format (like or dislike), but usually, we encounter continuous variables such as ratings, number of clicks, time spent at each post, and more.

Fortunately, K-Nearest Neighbors, one of the most popular and simplest machine learning algorithms, is the standard solution to such problems.

What is the K-Nearest Neighbors algorithm?

K-Nearest Neighbors (K-NN) is classified as a supervised machine learning algorithm. It means that it relies on labelled input data to learn a function that produces an appropriate output when given new unlabeled data. K-NN calculates the distances between the point and other points in the data set. Then it assigns the point to the class among its nearest neighbours for classification problems or uses a weighted average of nearest neighbours for estimating continuous variables for regression problems.

But recommendation systems are often considered unsupervised machine learning algorithms, meaning that the points have no external classification (labels). Hence, we want K-NN to serve a different purpose for us in recommendation systems, which only calculates the distance between points and defines a similarity score between them based on distances.

How do recommendation systems work?

To measure the distance between points A and B in a feature space, various distance metrics have been used in the literature, such as Cosine, Minkowsky, Manhattan, and Chi-square. But the most widely used one is Euclidean distance. It is a measure of the actual straight line distance between two points in Euclidean space with the formula shown beside.

In the user-movie dataset below, we want to find three nearest neighbours (k = 3) of user-1, so we calculate the Euclidean distances between user-1 and all other users as below:

How do recommendation systems work?
User Movie dataset and K-NN

In this example, we are dealing with only two movies (two dimensions). Hence, it is easy to visualize distances. However, in reality, we have thousands of movies, meaning that the Euclidean distances have to be calculated at high dimensions. In these cases, we often face the curse of dimensionality.

How do recommendation systems work?
K-NN visualized distances

What is the curse of dimensionality?

So how can the curse of dimensionality affect our similarity score calculation in the collaborative filtering method?

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings. Richard E. Bellman coined this expression, and in other words, it means things are different in higher dimensions!

Let us clarify it with a real example using the Movielens dataset. In this dataset, we have ratings of 610 users for more than 9000 movies.

In the first step, we calculate the Euclidean distance between user-1 with all the other users at different dimensions (different no. of movies) from 2 to 9000. Then we calculate the coefficient of variation (CV) of distances at each dimension and plot the results to study the trend of frequency distributions concerning dimensionality.

Coefficient_Variation

The CV is a statistical measure of the dispersion of data points in a data series. It shows the extent of variability, and it is calculated with the formula beside:

How do recommendation systems work?
Data Dispersity Trend

By observing the plot above, we can see a drastic decrease in CV by increasing the number of movies (dimensionality). The lower the value of CV, the smaller the level of dispersion around the mean. This observation implies that having too many dimensions (movies) causes users to have equal Euclidean distances from one another. Hence, all users become equally similar to each other, which is a big problem for a recommendation system. That is when the Dimensionality Reduction Algorithms come in to use.

Dimensionality reduction is a technique that seeks a lower-dimensional representation of data that preserves the salient relationships in the data. It is a hot topic in Machine Learning, and there are a handful of sophisticated dimensionality reduction methods. But going through each one’s details is beyond the scope of this article. Here you can find some examples with references for further reads:

  • Principal Component Analysis (PCA) – link
  • Variational Auto-Encoder (VAE) – link
  • Singular Value Decomposition (SVD) – link
  • Linear Discriminant Analysis (LDA) – link

What are the other drawbacks of collaborative filtering methods?

Even though memory-based CF methods use comprehensive and straightforward algorithms, and the only information we need to apply them is user-item interaction (ratings), there are still some drawbacks.

They are computationally expensive algorithms when scaling-up. For example, every time a new user sign-up or a new movie is added to our movie recommender system, similarity scores between all the users/items need to be recalculated. Using dimensionality reduction algorithms and alternative techniques instead of brute force K-NN such as Approximate Nearest Neighbors (ANN) drastically enhances the execution time, but still, scaling-up is heavy-duty.

Besides, CF can not handle fresh users/items if there is no history of ratings from a user, as the model can’t find similar users. This issue is often called the cold-start problem.

Finally, the evaluation of recommendation systems is a complicated process. The first approach to get an overview of the model’s performance is metric-based evaluations by dividing our data into a train and test split. However, the most common and best evaluation process is recording users’ feedback on the recommended items and applying an A/B testings approach.

Final thoughts

Collaborative filtering is an effective method with numerous use cases in many domains without prior knowledge about the field. Also, It does not require features of the items or users to be known.

Feasible implementation of CF with K-NN and Dimensionality Reduction Algorithms enables us to overcome tasks that require similarity score calculation, especially in small user-item interaction datasets.

Here at DigitalScaler, we used CF methods to design a recommendation system for employees based on their skill levels. Our forthcoming article will demonstrate how we can find the best-suited employees to be reskilled according to a newly defined role following the steps we described in this article.

I want to unleash algorithms on my data

Discover More