A project experimenting with Extracting Moore Machines from Recurrent Sequence Models, based heavily on reproducing and extending the experiments in the paper "Learning Finite State Representations of Recurrent Policy Networks" (repo).
Currently, the container-based environment has been tested to work on both Ubuntu (GPU / CPU) and macOS (CPU-only) hosts. For GPU support, you will need a compiant NVIDIA GPU. See the Installation Section for more details.
Table of Contents
This repo contains the docker container and python code to fully experiment with mnn. The whole experiment is contained in MNN_testing.ipynb
.
This project is based on stable-baselines, OpenAI Gym, MiniGym, tensorflow, and wombats
See my paper or the notebook for more of the training results and hyperparameter choices.
This agent observes pixels from the game and outputs an action at each timestep. The agents transforms pixels to LSTM hidden state using the Original Atari CNN Architecture and then the LSTM outputs its updated hidden state to the actor-critic architecture, where the action distribution and value function are both estimated by the final network layers. These networks are trained using the PPO2 actor-crtic RL algorithm, chosen here because of its performance, ease of hyperparameter tuning, and easy parallelizability. Below is a gif of the trained CNN-LSTM agent in a single pong environment:
This agent again observes pixels from the game and outputs an action at each timestep. However, there are now two quantized bottleneck networks (QBNs) placed after the CNN feature extractor and after the LSTM state output. These QBNs are quantized autoencoders, where the latent state of each autoencoder network has neurons that are quantized to have activated values of either -1, 0, or 1. This means that the entire policy network - called a moore machine network (MMN) - is now technically a finite state machine, specifically a moore machine, that uses the CNN as its discrete observation function and the LSTM as the resultant state transition function. The two QBNs are each trained separately until their reconstruction loss is quite low, and then they are inserted into the original CNN-LSTM network as described above to form the final MMN. Below is a gif of the trained MMN agent in a single pong environment:
Below is a table showing the mean non-discounted reward for each agent over 10 monte-carlo rollouts:
Original CNN-LSTM Agent | MMN Agent |
---|---|
20.3 ± 0.2. | 18.90 ± 1.14 |
Thus, the MMN seems to have pretty comparable performance to the original policy, despite it now being represented by a finite state machine. However, looking at the agents, we can see that the MMN certainly looks to be less "smooth" overall, something we expect given the compressed, finite observation and state space of the MMN policy. No fine-tuning of the MMN policy network was implemented, so the MMN could certainly be improved by some more training in the environment.
Here is a high-level overview of the steps taken in the learning of moore machine network (MMN) controller:
-
Learn an feature_extractor-rnn_policy for a RL environment using a standard RL algorithm capable of learning with a recurrent policy (e.g. ACKTR or PPO2). Here the feature extraction network is known as
F_ExtractNet
and the RNN policy that takes these features and produces the next action is known asRNN_Policy
. If your environment already has simple, discrete observations, you will not needF_ExtractNet
and can directly feed the observation into theRNN_Policy
. -
Generate "Bottleneck Data". This is where you simulate many trajectories in the RL environment, recording the observations and the actions taken by the
RNN_Policy
. This is for training the "quantized bottleneck neural networks" (QBNs
) next. -
Learn
QBNs
, which are essentially applied autoencoders (AE), to quantize (discretize):-
the observations of the environmental feature extractor:
- CNN if using an agent that observes video of the environment.
- MLP if getting non-image state observations
This is called
b_f
in the paper andOX
in the mnn code.
-
the hidden state of the
RNN_Policy
. This is calledb_h
in the paper andBHX
in the mnn code
-
This is done by
-
Insert the trained
OX
QBN before the feature extractor and the trainedBHX
QBN after the RNN unit in the feature_extractor-rnn_policy network to create what is now called the moore machine network (MMN
) policy. -
Fine-tune the
MMN
policy by re-running the rl algorithm using theMMN
policy as a starting point for RL interactions. Importantly, for training stability theMMN
is fine-tuned to match the softmax action distribution of the originalRNN_Policy
, not the argmax -> optimize with a categorical cross-entropy loss between the RNN andMMN
output softmax layers. -
Extract a classical moore machine from the
MMN
policy by doing:-
Generate trajectories in the RL environment using rollout simulations of
MMN
policy. For each rollout simulation timestep, we extract a tuple(h_{MMN, t-1}, f_{MMN, t}, h_{MMN, t}, a_{MMN, t})
:h_{MMN, t-1}
: the quantized hidden state of the RNN QBN at the previous timestepf_{MMN, t}
: the quantized observation state of the feature extractor QBN at the current timestep.h_{MMN, t}
: the quantized hidden state of the RNN QBN at the current timestep.a_{MMN, t}
: the action outputted by the MNN policy at the current timestep.
-
As you can see, we now have most of the elements needed to form a Moore machine:
h_{MMN, t-1}
-> prior state of the moore machine,h_{MM, t-1}
f_{MMN, t}
-> input transition label of the transition from moore machine stateh_{MM, t-1}
to moore machine stateh_{MM, t}
,o{MM, t}
.h_{MMN, t}
-> current state of the moore machine,h_{MM, t}
.a_{MMN, t}
-> output label of the current moore machine stateh_{MM, t}
,a_{MM, t}
.
-
What we are missing is a transition function
delta()
and an initial state of the moore machine,h_{MM, 0}
.-
delta()
: A moore machine needs a transition functiondelta(h_{MM, t - 1}, o_{MM, t}) -> h_{MM, t}
that maps the current state and observed feature to the next state. Here we will end up with a set of trajectories containingp
distinct quantized states (h_{MM}
) andq
distinct quantized features (o_{MM}
). These trajectories are then converted to a transition table representingdelta
, which maps any observation-state tuple(h_{MM}, o_{MM})
to a new stateh_{MM}'
. -
h_{MM, 0}
: In practice, this is done by encoding the start state ofRNN_Policy
usingBHX
:h_{MM, 0} = BHX(h_{
MMN, 0}
.
-
-
-
Minimize the extracted moore machine to get the smallest possible model. "In general, the number of states
p
will be larger than necessary in the sense that there is a much smaller, but equivalent, minimal machine". Thus, use age old moore machine minimization techniques to learn the moore machine. This process is exactly the process in Grammatical Inference, thus we can use my own wombats tool. -
You're done. You now have a moore machine that operated on the abstract, quantized data obtained from the
QBNs
. To use the moore machine in an environment:-
Start by using
OX
and the feature extractor to take the initial environmental observationf_{env, 0}
and get the moore machine feature observationo_{MM, 0} = OX.encode(F_ExtractNet(f_{env, 0}))
. -
Use
delta
witho_{MM, 0}
andh_{MM, 0}
(part of the definition of the moore machine) to get the action,delta(o_{MM, 0}, h_{MM, 0}) = a_{MM, 0}
. -
Take a step in the environment using
step(env, a_{MM, 0)
to produce a new observationf_{env, 1}
and the environmental reward,r_t
. -
As in step 1-3, we do for
t = 1
onwards:o_{MM, t} = OX.encode(F_ExtractNet(f_{env, t}))
a_{MM, t} = delta(o_{MM, t}, h_{MM, t})
f_{env, t+1}, r_t = step(env, a_{MM, t})
-
-
run with a GPU-enabled image and start a jupyter notebook server with default network settings:
./docker_scripts/run_docker.sh --device=gpu
-
run with a CPU-only image and start a jupyter notebook server with default network settings:
./docker_scripts/run_docker.sh --device=cpu
-
run with a GPU-enabled image with the jupyter notebook served over a desired host port, in this example, port 8008, with tensorboard configured to run on port 6996. You might do this if you have other services on your host machine running over
localhost:8888
and/orlocalhost:6666
:./docker_scripts/run_docker.sh --device=gpu --jupyterport=8008 --tensorboardport=6996
-
run with a GPU-enabled image and drop into the terminal:
./docker_scripts/run_docker.sh --device=gpu bash
-
run a bash command in a CPU-only image interactively:
./docker_scripts/run_docker.sh --device=cpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE
-
run a bash command in a GPU-enabled image interactively:
./docker_scripts/run_docker.sh --device=gpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE
To access the jupyter notebook: make sure you can access port 8008 on the host machine and then modify the generated jupyter url:
http://localhost:8888/?token=TOKEN_STRING
with the new, desired port number:
http://localhost:8008/?token=TOKEN_STRING
and paste this url into the host machine's browser.
To access tensorboard: make sure you can access port 6996 on the host machine and then modify the generated tensorboard url:
(e.g. TensorBoard 1.15.0)
http://0.0.0.0:6006/
with the new, desired port number:
http://localhost:6996
and paste this url into the host machine's browser.
This repo houses a docker container with jupyter
and tensorbaord
services running. If you have a NVIDIA GPU, check here to see if your GPU can support CUDA. If so, then you can use the GPU-only instruction below.
Follow steps one (and two if you have a CUDA-enabled GPU) from this guide from tensorflow to prepare your computer for the tensorflow docker base container images. Don't actually install the tensorflow container, that will happen automatically later.
Follow the *nix docker post-installation guide.
Now that you have docker configured, you can need to clone this repo. Pick your favorite directory on your computer (mine is /$HOME/Downloads
ofc) and run:
git clone --recurse-submodules https://github.com/nicholasRenninger/NeuralMooreMachine_Experiments
cd NeuralMooreMachine_Experiments
The container builder uses make
:
- If you have a CUDA-enabled GPU and thus you followed step 2 of the docker install section above, then run:
make docker-gpu
- If you don't have a CUDA-enabled GPU and thus you didn't follow step 2 of the docker install section above, then run:
make docker-cpu