Deep reinforcement learning (DRL) has achieved significant success in addressing various tasks; however, trained agents often struggle to generalize beyond the environments in which they were initially trained. This thesis investigates the generalization capabilities of DRL algorithms through a series of independent experiments on control tasks of varying complexity. These tasks range from simple scenarios like Mountain Car to more complex challenges such as Pick and Place, utilizing the Gymnasium and Gymnasium-Robotics suites with the MuJoCo physics engine. Each experiment involves training a model using Proximal Policy Optimization (PPO) within a specific environment and subsequently evaluating its performance in slightly modified environments. The impact of hyperparameters on generalization is also examined. The objective is to identify the strengths and limitations of DRL agents in adapting learned policies to similar but altered environments.
Note
The purpose of this README is to provide instructions and details necessary to replicate the experiments. For more comprehensive information, please consult the Documentation directory.
Contains the implementation of Proximal Policy Optimization (PPO) applied to the different environments. Each environment has a dedicated directory following a consistent structure to ensure uniformity. These directories contain the necessary scripts and configuration files for setting up, training, and testing the models.
├── Algorithms/
│ ├── PPO_Acrobot/
│ │ ├── Experiments_Config/
│ │ │ ├── ... # Configurations for training experiments (.yaml)
│ │ ├── Results/
│ │ │ ├── ... # Results from training experiments (automatically)
│ │ ├── Generalization/
│ │ │ ├── ... # Generalization experiments and results
│ │ ├── dummy_example.py # Dummy script to test the environment setup
│ │ ├── Network_Acrobot.py # Defines Actor-Critic networks
│ │ ├── test_Acrobot.py # Script to test the trained models
│ │ ├── train_Acrobot.py # Configures experiments and calls the trainer
│ │ ├── trainer_Acrobot.py # Implements training of the model using PPO
│ ├── PPO_CartPole/ # Same structure as PPO_Acrobot
│ ├── PPO_ContinuousMountainCar/ # Same structure as PPO_Acrobot
│ ├── PPO_Pendulum/ # Same structure as PPO_Acrobot
│ ├── PPO_PickAndPlace/ # Same structure as PPO_Acrobot
Directory related to the experiment scripts (.sh) executed on the UPF Cluster for the Pick and Place scenario, organized into folders based on different hyperparameters. Additionally, it includes the SLURM output files generated from each experiment.
├── Cluster/
│ ├── BatchSize_exp/
│ │ ├── ... # Experiments relateds to the Batch Size
│ ├── LearningRate_exp/
│ │ ├── ... # Experiments relateds to the Learning Rates
│ ├── slurms-outputs/...
│ │ ├── ... # Slurms obtained in each of the experiments
│ ├── Timesteps_exp/...
│ │ ├── ... # Experiments relateds to the Timesteps
This directory contains all the custom environments created for the generalization experiments.
├── Custom_Env/
│ ├── custom_env.py # Script containing all custom environments
│ ├── dummy_test_custom_env.py # Dummy script for testing the custom environments
To run this project, you will need to install the required dependencies. It is recommended to use a virtual environment to manage these dependencies.
-
Clone the repository:
git clone https://github.com/ialexmp/DRL-Generalization.git cd DRL-Generalization
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
cd DRL-Generalization
Before starting training and generalization experiments, you can execute various dummy experiments to ensure that everything is set up successfully.
python ./Algorithms/PPO_[env_name]/dummy_example.py dummy_exp.yaml
To run training experiments with different configurations, use the experiment script:
python ./Algorithms/PPO_[env_name]/train.py train_config.yaml
You can adjust the training parameters by editing the configuration file located at Experiment_Config/train_config.yaml. Note that you can create different experiments within the same .yaml file, and the results will be automatically organized into separate folders named after each experiment within the .yaml file.
Once the agent is trained, you can evaluate the performance of the agent using:
python ./Algorithms/PPO_[env_name]/test.py [path_from_Results_folder]
# Example :
# python ./Algorithms/PPO_Acrobot/test.py 1M_lr0_01\exp1_2024-06.07_20-04-46
After training, you can evaluate the generalization performance of a trained agent in a custom environment with slight modifications using the command:
python ./Algorithms/PPO_[env_name]/Generalization/[Experiment_folder]/ZSG_[Experiment_name].py [path_from_Results_folder]
# Example :
# python ./Algorithms/PPO_Acrobot/Generalziation/Link_Mass/ZSG_linkMass.py 1M_lr0_01\exp1_2024-06.07_20-04-46
Contributions are welcome! Please fork the repository and create a pull request with your changes. Ensure that your code follows the project's style guidelines and includes relevant tests.
This project is licensed under the MIT License. See the LICENSE file for details.