ExORL: Exploratory Data for Offline Reinforcement Learning

This is an original PyTorch implementation of the ExORL framework from

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by

Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.

*Equal contribution.

Prerequisites

Install MuJoCo if it is not already the case:

Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate exorl

Datasets

We provide exploratory datasets for 6 DeepMind Control Stuite domains

Domain	Dataset name	Available task names
Cartpole	`cartpole`	`cartpole_balance`, `cartpole_balance_sparse`, `cartpole_swingup`, `cartpole_swingup_sparse`
Cheetah	`cheetah`	`cheetah_run`, `cheetah_run_backward`
Jaco Arm	`jaco`	`jaco_reach_top_left`, `jaco_reach_top_right`, `jaco_reach_bottom_left`, `jaco_reach_bottom_right`
Point Mass Maze	`point_mass_maze`	`point_mass_maze_reach_top_left`, `point_mass_maze_reach_top_right`, `point_mass_maze_reach_bottom_left`, `point_mass_maze_reach_bottom_right`
Quadruped	`quadruped`	`quadruped_walk`, `quadruped_run`
Walker	`walker`	`walker_stand`, `walker_walk`, `walker_run`

For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms

Unsupervised RL method	Name	Paper
APS	`aps`	paper
APT(ICM)	`icm_apt`	paper
DIAYN	`diayn`	paper
Disagreement	`disagreement`	paper
ICM	`icm`	paper
ProtoRL	`proto`	paper
Random	`random`	N/A
RND	`rnd`	paper
SMM	`smm`	paper

You can download a dataset by running ./download.sh <DOMAIN> <ALGO>, for example to download ProtoRL dataset for Walker, run

./download.sh walker proto

The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).

Offline RL training

We also provide implementation of 5 offline RL algorithms for evaluating the datasets

Offline RL method	Name	Paper
Behavior Cloning	`bc`	paper
CQL	`cql`	paper
CRR	`crr`	paper
TD3+BC	`td3_bc`	paper
TD3	`td3`	paper

After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run

python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk

Logs are stored in the output folder. To launch tensorboard run:

tensorboard --logdir output

Citation

If you use this repo in your research, please consider citing the paper as follows:

@article{yarats2022exorl,
  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
  journal={arXiv preprint arXiv:2201.13425},
  year={2022}
}

License

The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.

Name	Name	Last commit message	Last commit date
Latest commit denisyarats update mujoco and remove unused code Feb 8, 2022 a3fb07a · Feb 8, 2022 History 7 Commits
agent	agent	Init (#1 )	Feb 6, 2022
custom_dmc_tasks	custom_dmc_tasks	Init (#1 )	Feb 6, 2022
.gitignore	.gitignore	Init (#1 )	Feb 6, 2022
LICENSE	LICENSE	Create LICENSE (#2 )	Feb 6, 2022
README.md	README.md	Update README.md	Feb 8, 2022
conda_env.yml	conda_env.yml	update mujoco and remove unused code	Feb 8, 2022
config.yaml	config.yaml	Init (#1 )	Feb 6, 2022
dmc.py	dmc.py	Init (#1 )	Feb 6, 2022
download.sh	download.sh	Init (#1 )	Feb 6, 2022
logger.py	logger.py	Init (#1 )	Feb 6, 2022
replay_buffer.py	replay_buffer.py	Init (#1 )	Feb 6, 2022
train_offline.py	train_offline.py	update mujoco and remove unused code	Feb 8, 2022
utils.py	utils.py	Init (#1 )	Feb 6, 2022
video.py	video.py	update mujoco and remove unused code	Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExORL: Exploratory Data for Offline Reinforcement Learning

Prerequisites

Datasets

Offline RL training

Citation

License

About

Releases

Packages

Languages

License

denisyarats/exorl

Folders and files

Latest commit

History

Repository files navigation

ExORL: Exploratory Data for Offline Reinforcement Learning

Prerequisites

Datasets

Offline RL training

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages