Homogenise and debug RL solvers and implement action masking in RLlib #308

fteicht · 2024-01-19T10:31:04Z

This PR both improves the RL experience in scikit-decide and corrects some bugs in the RL solvers:

Implement action masking in the RLlib solver for the domains that implement _get_application_actions_from()
Implements the bi-directional conversion from numpy arrays to user-defined classes in the scikit-decide spaces deriving from gym.Space
Allows DictSpace and TupleSpace to use subspaces optionally deriving from GymSpace, and implements proper recursive unwrapping and wrapping of elements belonging to those subspaces
Implements a SetSpace space that derives from GymSpace and that provides efficient search in the contains method
Allows the hub domains Maze, SimpleGridWorld and Mastermind to be solvable by RLlib in addition to StableBaselines
Correct a bug in scikit-decide's gym domain where the reset method did not unwrap the initial observation
Correct a bug in scikit-decide's StableBaselines solver where the _sample_action method did not unwrap the current observation
Improve coverage of RL solvers in the unit tests

neo-alex

Great job, thank you!

This commit improves the support of unified planning in scikit-decide, notably enabling to solve UP domains with the RLlib solver. It relies on: - the recent handling of state-based applicable actions, which are produced by unified planning domains, in the RLlib solver (PR #308) ; - the automatic translation to/from unified planning states and actions from/to numpy arrays that can be handled by RLlib.

fteicht added 8 commits January 5, 2024 18:50

Progress implementing action masking for the rllib wrapper

b0b0cc2

WiP rllib filtered actions

4882247

WiP rllib allowed action handling

2864635

WiP RL homogeneization

8d82e4f

WiP RL homogeneization

afd15fa

More generic action masking model

1bae6ac

Merge branch 'airbus:master' into rayrllib-restricted-actions

d356fb8

pre-commit cleanups

ce5239a

fteicht added bug Something isn't working enhancement New feature or request python Pull requests that update Python code labels Jan 19, 2024

fteicht requested a review from neo-alex January 19, 2024 10:31

fteicht self-assigned this Jan 19, 2024

fteicht added 4 commits January 19, 2024 14:10

Limit number of rollouts in rllib unit tests

f1ffe87

Split RLlib examples

130e3f0

Correct wrong assert syntax in gym.py

a3a33cc

Correct element class test in gym spaces

5dec377

neo-alex approved these changes Jan 23, 2024

View reviewed changes

fteicht merged commit 7b40743 into airbus:master Jan 23, 2024
42 checks passed

fteicht deleted the rayrllib-restricted-actions branch January 23, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Homogenise and debug RL solvers and implement action masking in RLlib #308

Homogenise and debug RL solvers and implement action masking in RLlib #308

fteicht commented Jan 19, 2024 •

edited

Loading

neo-alex left a comment

Homogenise and debug RL solvers and implement action masking in RLlib #308

Homogenise and debug RL solvers and implement action masking in RLlib #308

Conversation

fteicht commented Jan 19, 2024 • edited Loading

neo-alex left a comment

Choose a reason for hiding this comment

fteicht commented Jan 19, 2024 •

edited

Loading