RL with Delayed, Aggregated and Anonymous Feedback (DAAF)

Code for experiments on policy control and evaluation in Reinforcement Learning with delayed, aggregated and anonymous feedback.

Delayed, aggregate, anoynmous feedback

In the standard reinforcement learning setting, for each action an agent takes, the environment provides a reward. This is encoded by the function $R(s,a)$, where $s$ is a state and $a$ in an action.

In DAAF settings, the environment instead provides feedback at periodic time intervals (e.g. based on a Poisson distribution), and on aggregate, in the sense that the agent gets a combination of rewards for several actions. The fact that the agent cannot discern how much each action taken contributes to the observed reward makes the feedback anonymous.

To constrast with fully sparse reward problems, where the reward is only observed at the end, after task completion or failure, DAAF problems have intermittent feedback.

Code

Contains

Algorithms for policy control with DAAF
Algorithms for policy evaluation with DAAF
Notebooks with analysis results on reward rstimation or recovery

Submissions

For specific snapshots of code submitted to conferences:

Dev Env

First, make sure the following python development tools are installed:

uv
ruff

Then, in a virtual environment, run pip-compile and install:

$ make pip-compile
$ make pip-install

These should install all the requirements dependencies for development.

For building, install tox and tox-uv

$ pip install tox tox-uv

Dependencies

The dependecy files map to a purpose as follows:

requirements.in: packages for the experiments.
test-requirements.in: for running tests.
nb-requirements.in: for jupyter notebooks.
rendering-requirements.in: for environments can be rendered in a graphical interface, with OpenGL.
ray-env-requirements.in: for ray in a cluster environment. During compilation with pip-compile, it's best to exclude the version of ray (see Makefile).

All requirements files are compiled using uv.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
experiments		experiments
notebooks		notebooks
sbin/local		sbin/local
src/daaf		src/daaf
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
dev-requirements.in		dev-requirements.in
dev-requirements.txt		dev-requirements.txt
mypy.ini		mypy.ini
nb-requirements.in		nb-requirements.in
nb-requirements.txt		nb-requirements.txt
pyproject.toml		pyproject.toml
ray-env-requirements.in		ray-env-requirements.in
ray-env-requirements.txt		ray-env-requirements.txt
rendering-requirements.in		rendering-requirements.in
rendering-requirements.txt		rendering-requirements.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.in		test-requirements.in
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL with Delayed, Aggregated and Anonymous Feedback (DAAF)

Delayed, aggregate, anoynmous feedback

Code

Submissions

Dev Env

Dependencies

About

Releases

Packages

Languages

guidj/rl-daaf

Folders and files

Latest commit

History

Repository files navigation

RL with Delayed, Aggregated and Anonymous Feedback (DAAF)

Delayed, aggregate, anoynmous feedback

Code

Submissions

Dev Env

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages