Maxmax_Q_learning Dependence: Python==3.6.2 numpy==1.19.5 gym==0.10.5 cython==0.29.14 torch==1.3.1 wandb To train, run train_env.sh in DG_MPE Environment: multiagent is for the MPE environment world_ns.py is for Differential Game environment