Skip to content

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

License

Notifications You must be signed in to change notification settings

sparsh-ai/reco-bandit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecoBandit

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

App 1

Thompson Sampling, Single-user Multi-product Simulation, Multi-armed Bandit

The objective of this app is to apply the bandit algorithms to recommendation problem under a simulated envrionment. Although in practice we would also use the real data, the complexity of the recommendation problem and the associated algorithmic challenges can already be revealed even in this simple setting.

RecoBandit - Thompson Sampling Simulation

Inspired by the following works:

App 2

Multi-user Multi-product Contextual Simulation, Contextual Bandit, Vowpal Wabbit

The objective of this app is to apply the contextual bandit algorithms to recommendation problem under a simulated envrionment. The recommender agent is able to quickly adapt the changing bahavior of users and change the recommendation strategy accordingly.

VW Contextual Bandit Simulation

App 3 (next release)

Image Embeddings, Offline Learning

The objective is to recommend products and adapt the model in real-time using user's feedback using Actor-critic algorithm. Suppose, we observed users’ behavior and acquired some products they clicked on. It is fed into the Actor Network which decides what we would like to read next. It produces an ideal product embedding. It can be compared with other product embeddings to find similarities. The most matching one will be recommended to the user. The Critic helps to judge the Actor and help it find out what is wrong.

Inspired by the following works:

App 4 (next release)

Offline Learning

The core intuition is that we couldn't just blindly apply RL algorithms in a production system out of the box. The learning period would be too costly. Instead, we need to leverage the vast amounts of offline training examples to make the algorithm perform as good as the current system before releasing into the online production environment. An agent is first given access to many offline training examples produced from a fixed policy. Then, they have access to the online system where they choose the actions.

Inspired by the following works:

What is Bandit based Recommendation?

Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces.

Use case 1: Personalized recommendations

Goal: Quickly help users find products they would like to buy

In e-commerce and other digital domains, companies frequently want to offer personalised product recommendations to users. This is hard when you don’t yet know a lot about the customer, or you don’t understand what features of a product are pertinent. With limited information about what actions to take, what their payoffs will be, and limited resources to explore the competing actions that you can take, it is hard to know what to do.

Use case 2: Online model evaluation

Goal: Compare and find the best performing recommender model

Use case 3: Personalized re-ranking

Goal: Bring the most relevant option to the top

Use case 4: Personalized feeds

Goal: Recommend a never-ending feed of items (news, products, images, music)

https://youtu.be/CgGCbmlRI3o

References

  1. LinUCB Contextual News Recommendation
  2. Experiment with Bandits
  3. n-armed Bandit Recommender
  4. Bandit Algorithms for Website Optimization [eBook O’reilly] [GitHub] [Colab]
  5. MAB Ranking PyPi
  6. RecSim GitHub, Video, Medium
  7. https://vowpalwabbit.org/tutorials/contextual_bandits.html
  8. https://github.com/sadighian/recommendation-gym
  9. https://learning.oreilly.com/library/view/reinforcement-learning-pocket/9781098101527/ch02.html
  10. https://github.com/awarebayes/RecNN/
  11. https://vowpalwabbit.org/neurips2019/
  12. https://github.com/criteo-research/reco-gym
  13. https://pypi.org/project/SMPyBandits/
  14. https://github.com/bgalbraith/bandits
  15. https://pypi.org/project/mab-ranking/
  16. https://www.optimizely.com/optimization-glossary/multi-armed-bandit/
  17. https://abhishek-maheshwarappa.medium.com/multi-arm-bandits-for-recommendations-and-a-b-testing-on-amazon-ratings-data-set-9f802f2c4073

About

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

Topics

Resources

License

Stars

Watchers

Forks