Skip to content

guidj/rl-daaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL with Delayed, Aggregated and Anonymous Feedback (DAAF)

Code for experiments on policy control and evaluation in Reinforcement Learning with delayed, aggregated and anonymous feedback.

Delayed, aggregate, anoynmous feedback

In the standard reinforcement learning setting, for each action an agent takes, the environment provides a reward. This is encoded by the function $R(s,a)$, where $s$ is a state and $a$ in an action.

In DAAF settings, the environment instead provides feedback at periodic time intervals (e.g. based on a Poisson distribution), and on aggregate, in the sense that the agent gets a combination of rewards for several actions. The fact that the agent cannot discern how much each action taken contributes to the observed reward makes the feedback anonymous.

To constrast with fully sparse reward problems, where the reward is only observed at the end, after task completion or failure, DAAF problems have intermittent feedback.

Code

Contains

  • Algorithms for policy control with DAAF
  • Algorithms for policy evaluation with DAAF
  • Notebooks with analysis results on reward rstimation or recovery

Submissions

For specific snapshots of code submitted to conferences:

  1. DS '22 - Policy Evaluation
  2. EMCL-PKDD '24 - Policy Control

Dev Env

First, make sure the following python development tools are installed:

Then, in a virtual environment, run pip-compile and install:

$ make pip-compile
$ make pip-install

These should install all the requirements dependencies for development.

For building, install tox and tox-uv

$ pip install tox tox-uv

Dependencies

The dependecy files map to a purpose as follows:

All requirements files are compiled using uv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages