Stable-Baselines Implementation of MixReg regularization technique for PPO2 (uses impala CNN as feature extractor as specified in paper)
Note: Dependency on stable-baselines(2.10.1 at time of writing) python library,
from MIXREG_ImpalaCnn import ImpalaCnn
from mixreg import MIXREG
# Use exactly the same as PPO2
model = MIXREG(ImpalaCnn, env, verbose=0, n_steps = 2048, nminibatches=8)
Performance of Impala CNN compared to Nature CNN feature extraction in base PPO2 model using FruitBot Environment:
implementation of MixReg outperforms base PPO2 in terms of generalization ability on limited training levels for FruitBot Environment: