Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 1.38 KB

README.md

File metadata and controls

38 lines (26 loc) · 1.38 KB

DPO

Note

The implementation of the following methods can be found in this codebase:

Installation

How to run

 python3 on-policy-main/train_smac.py  --map_name 2s3z --use_eval  --penalty_method True --dtar_kl 0.02   --experiment_name dtar_0.02_V_penalty_2M --num_env_steps 2000000 --group_name dpo --seed 1 --multi_rollout True --n_rollout_threads 1

Results

Here, we provide results in three different SMAC scenarios using default hyperparameters. 2s3z8m3s5z --->

Citation

If you are using the codes, please cite our papers.

Kefan Su and Zongqing Lu. A Fully Decentralized Surrogate for Multi-Agent Policy Optimization. TMLR, 2024

@article{DPO,
title={A Fully Decentralized Surrogate for Multi-Agent Policy Optimization},
author={Su, Kefan and Lu, Zongqing},
journal={Transactions on Machine Learning Research},
year={2024}
}