Distributional Superiority

This repository contains the reference implementation of the [DAU+]DSUP($q$) algorithms presented in:

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

by Harley Wiltzer*, Marc G. Bellemare, David Meger, Patrick Shafto, and Yash Jhaveri*.

Setup

This project uses PDM for dependency management. See https://pdm-project.org/latest/#installation for installation instructions.

Once PDM has been installed, execute the following from the project root to sync the dependencies:

pdm venv create
pdm install

Before running any code, be sure to activate the virtual environment (from the project root):

source .venv/bin/activate

Downloading Data

Some environments simulate dynamics from datasets. The download_data.sh file downloads these datasets. Make this script executable:

chmod +x download_data.sh

Then run the script to download the datasets:

./download_data.sh

This script will create a data directory in the project root with the requisite datasets.

Training an Agent

The easiest way to run training scripts is with our justfile, using the just command runner.

Risk Neutral Simulation

To train agents for risk-neutral option trading, execute

just writer=[aim | comet] agent=[dsup | qrdqn | dau] option_idx=<int> time_mul=<int> train_options

Here, option_idx specifies the commodity for the environment, and time_mul is the decision frequency. Setting time_mul=1 results in the base frequency, and time_mul=n is n times the base frequency.

To train the DAU+DSUP(1/2) variant, execute replace train_options with train_options_dsup_shifted.

CVaR Simulation

To train agents for risk-sensitive option trading with CVaR, execute

just writer=[aim | comet] agent=[dsup | qrdqn | dau] option_idx=<int> time_mul=<int> risk_param=<float> train_options_risky

Here, risk_param refers to the CVaR level for the experiment.

Citation

If you build on our work or find it useful, please cite it using the following bibtex,

@inproceedings{wiltzer2024action,
  title={Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning},
  author={Harley Wiltzer and Marc G. Bellemare and David Meger and Patrick Shafto and Yash Jhaveri},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=BRW0MKJ7Rr}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
dsup		dsup
tests		tests
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_data.sh		download_data.sh
justfile		justfile
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributional Superiority

Setup

Downloading Data

Training an Agent

Risk Neutral Simulation

CVaR Simulation

Citation

About

Releases

Packages

Contributors 2

Languages

License

harwiltz/distributional-superiority

Folders and files

Latest commit

History

Repository files navigation

Distributional Superiority

Setup

Downloading Data

Training an Agent

Risk Neutral Simulation

CVaR Simulation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages