Insights

Celebrating Dataset Diversity in Recommender Systems

Dataset diversity

You’ve doubtless come across recommenders before. You’ll certainly have been on the receiving end of a few. The question is, what do they look like on the inside?

Robot

In fact, there is tremendous variety – recommendation is an old problem, and was first tackled within classical information retrieval in the 80s. Since then, deep learning has inevitably infiltrated the field, and recommenders today are often designed around neural networks.

Typically, the different recommender setups are treated separately. Within a given dataset, you would look for an archetypal form and start building a recommender of that type. But instead of pruning your dataset to fit a well-studied example, are there ways to design a solution that fits your dataset? In this article I’ll suggest how you might try.

To begin, let’s recap the familiar ways.

Recommender archetypes

One basic recommender setup, for example the Netflix Prize problem, centres around a user–item matrix. Entries of this matrix are ‘ratings’ that a user gave to an item (movie), and some of the entries are missing.

box

The recommender task here is to fill in the missing ratings so that we know which movies a user is likely to rate highest when watched. These would then be suitable suggestions to make to the user if we want them to have a good time on Netflix. Naturally this idea is extremely versatile, extending to any scenario with identifiable ‘users’ and some explicit or implicit ‘rating’:

user	items	rating
Netflix account	movies	0-5 stars
store loyalty card	clothing	purchase
careers site profile	job listings	click count

But that’s not the only task in recommendation. Another common pattern comes from trying to extrapolate a sequence of related actions, or predict some final action. Typically this might be a real-time “session” of online activity, but other examples exist for longer timeframes:

actions	prediction	task
clicking on products	purchased products	basket prediction
streaming music tracks	next track to play	playlist continuation
betting on sports	upcoming games to follow	fanbase activation

Other setups include:

Content-based recommendation – where there exists metadata for the items and/or users – modelling the nature of the item rather than its identity. These approaches have the advantage of translating well to new items.
Association rule mining – where co-occurrence is favoured over other signals.
Similarity measures – for example, recommendation by selecting the most similar song to the one just played.
Knowledge-based recommendation – where the items shown are selected by extrinsic knowledge of their aptitude (using hard and soft criteria rather than observation)

What do the archetypes above have in common?

Well, the task is always to recommend. That is, to take a collection of potentially interesting items and pick out the best. Described like this, it’s essentially a search problem: to display the most relevant content for the query. But… what is the query?

The state of search

magnifying glass

This is where we turn to the latest in web search for clues.

When you make a search request at google.com, there is more going on behind the scenes than simple document retrieval. In fact, Google links to a page explaining the signals it uses to enhance your query.

It almost seems as if they know what to show you before you even ask for it…

In case I was too subtle just then, I’ll restate my point: recommendation is search without an explicit query. Instead of processing a text input to deduce intent, we rely only on contextual clues such as when and where the user is right now, their personal profile, and recent activity. Overloading the term, we shall refer to all of this together as the ‘query’ of recommendation. Inside this search framework, we have the following channels to design and exploit:

Dataset Diversity

Just as with smart web search, different contextual signals may contribute in different ways to the final results of a recommender system. While it is certainly possible to set up a super-mega-ultra all-in-one solution, why bother? Is it worth the struggle to build a model that uses everything at once, or conversely, treats the inputs totally separately and individually?

Extension and Combination

Dataset Diversity

The pragmatic idea of a composite recommender is not this author’s to claim. In a 2002 paper by Burke, seven answers are given to the relatively benign question of how to form a “hybrid” from two or more archetypal recommenders:

Hybridization method	Description
Weighted	The scores (or votes) of several recommendation techniques are combined together to produce a single recommendation.
Switching	The system switches between recommendation techniques depending on the current situation.
Mixed	Recommendations from several different recommenders are presented at the same time.
Feature combination	Features from different recommendation data sources are thrown together into a single recommendation algorithm.
Cascade	One recommender refines the recommendations given by another.
Feature augmentation	Output from one technique is used as an input feature to another.
Meta-level	The model learned by one recommender is used as input to another [e.g. embedding].

Yes, these techniques were intended to apply to hybridizing existing recommenders, but the principles apply just as well to building a modular multi-input recommender from scratch.

Although – caveat – the first four approaches do not offer a real response to the question posed at the end of the last section. Only the last three (“cascade”, “feature augmentation” and “meta-level”) have any flexibility.

To illustrate the principles of modular recommender architecture, we’ll now examine a dummy dataset with all the typical tables you might expect when tasked with building a recommendation engine. We’ll then show how this approach might help make a manageable ML project out of an otherwise daunting proposition.

Data

It helps to consider that there are 4 types of source metadata before thinking about derived information:

1. User metadata

user ID	postcode	date of birth	etc.
AA001122	W2 9XB	1969-07-20	…

2. Item metadata

item ID	category	material	etc.
TABL525A	garden furniture	hickory	…

3. User–item interaction metadata

user ID	item ID	purchase price	time	etc.
AA001122	TABL525A	349.99	20180526T0731	…

4. Query context metadata

current page	time on page	device	time	etc.
www.site.tld/page	108 s	mobile	20190902T2235	…

From these sources flow all useful truth. It is the job of the data scientist to extract it, and the data techniques applied can be drawn into largely distinct groups, illustrated on the below diagram.

Dataset Diversity

The first level of information mining is association, with the intent of binding one entity to another by similarity or co-occurrence of entries in a table:

from User metadata
- (user, user) → similarity
from Item metadata
- (item, item) → similarity
from User–item interaction metadata
- (user, item) → affinity
- (item, item) → association
- (user, user) → association

Second, there is feature engineering, with the intent of capturing information to be used in later efforts. Features are often transformations or aggregations of data in source tables:

+ Item features
- = user-supplied metadata (e.g. tags, comments)
- = aggregated user interactions
- = aggregated user features
+ User features
- = aggregated item interactions
- = aggregated item features
+ Contextual features
- = item context (novelty, trending etc.)
- = user context (recent activity, past sessions)

And finally there is modelling – constructing additional information through machine learning, supervised or unsupervised:

Item embeddings
User embeddings
Imputation
Recommendation

Constructing complex recommenders

Under this framework, it becomes easier to lay out how to assemble an advanced multi-input recommender. It may help to think in terms of the operations available to us.

Start with the definition of the recommender problem:

Operation	Starts with	Yields
Recommendation	(query, item set, model)	item to recommend
Partial recommendation [“Cascade”]	(query, item set, model)	filtered item set

To this add some elementary operations to obtain features:

Operation	Starts with	Yields
Feature lookup	query	features
Data transformation	features	new features

And lastly some elementary operations for modelling:

Operation	Starts with	Yields
Model training	(input features, target features)	new model
Model execution	(input features, model)	new features as output
Model combination	(model, model)	model

Model training may involve any combination of inputs and outputs from the features described so far. Using or obtaining an embedding (“meta-level” hybridization) is illustrated with orange in below diagram. And “feature augmentation” is simply a matter of chaining the process into another model.

So, with model combination abstracted away, the ‘irreducible unit’ of modelling is this:

The final layer

At this stage we are back in line with Burke’s work. Having dealt with “meta-level”, “feature augmentation”, and “feature combination” types of hybridization in the last section, we are left with the relatively simple “weighted”, “switching”, “mixed” and “cascade”, all of which shall simply be represented without comment by the diagram from earlier:

And just like that, we’re done.

Concluding remarks

In this post, I’ve deliberately avoided the particulars of neural network architecture, feature engineering, or other mathematical choices that will inevitably be just as rich a field of variation as the concepts described above. My hope is that any readers who have enjoyed or appreciated this journey through data space will put careful thought into the craft of advanced recommenders, and consider intelligently the relative merits of increased complexity at each scale. Deep learning can sometimes overwhelm by its scope and pace, but its practice should always aim to address the real problems – this article contributes by making more explicit some decisions that are often taken implicitly.

Good luck, and happy recommending!

View all

Insights

Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards

Insights

Coding Confidence: Inspiring Women in Data and Analytics

Insights

Prompt Engineering 101: Using GenAI Effectively

Generative AI

Generative AI

Cloud Data Migration

Marketing Analytics

Looker

Retail

Telecommunications

Gaming

Media

Financial Services

Events

Whitepapers

Insights

Celebrating Dataset Diversity in Recommender Systems

Dataset diversity

Recommender archetypes

The state of search

Extension and Combination

Data

Constructing complex recommenders

The final layer

Concluding remarks

Related

Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards

Coding Confidence: Inspiring Women in Data and Analytics

Prompt Engineering 101: Using GenAI Effectively