Real Time Reinforcement Learning

Implementation of Real-Time Markov Decision Process (RTMDP) and Real-Time Actor-Critic (RTAC) as presented in the following paper:

https://arxiv.org/abs/1911.04448

Additionally, a variation of the LunarLander environment with a variable physic step size is included for experiments.

Installation

You need to either create an environment or update an existing environment. After creating an environment you have to activate it:

Create environment

conda env create -f environment.yml

Update environment (if env exists)

conda env update -f environment.yml --prune

Activate environment

conda activate real-time-rl

Basic Usage

The experiments presented by us can be replicated by calling main on the command line. Secondly, the interface of ActorCritic agents can also be called directly. For the possible command line arguments take a look here

Environment Wrappers

Firstly, create an environment to be solved.

env = gym.make('CartPole-v1')
env2 = CustomLunarLander(step_size=0.1)

Optionally, one can use wrappers to modify the environment. The RTMDP wrapper converts a given environment into the real-time version. The PreviousActionWrapper adds the last action to the state space without introducing the feed-through mechanism of the actions.

real_time_env = RTMDP(env)
extended_env = PreviousActionWrapper(env)

Agents

The two agents to be used are SAC and RTAC. Both have the same interface with some optional arguments (which differ between the interfaces). If one wants to evaluate the performance of the agent, a separate evaluation environment has to be given. This is necessary, because when evaluating at specific time steps during training the agent might be in the middle of an episode.

sac = SAC(env, eval_env=eval_env, seed=0)
sac.train()
avg_reward = sac.evaluate()

rtac = RTAC(env, eval_env=eval_env, seed=0)
rtac.train()
avg_reward = rtac.evaluate()

It is also possible to track data during the training process:

rtac = RTAC(env, eval_env=eval_env)
performance_list = rtac.train(track_stats=True)

The neural network used by the agents can be specified using a keyword argument dictionary:

network_kwargs = {'num_layers': 3, 'hidden_size': 128, 'normalized': True}
rtac = RTAC(env, network_kwargs=network_kwargs)

Experiments

For our Experiments, we used the default settings of the sac and rtac agents given in the original paper. For the exact values one may look at the implementations in the source code.

We used our consumer desktop computers for the experiments. The exact specifications of our three computers can be found below:

Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz, 32GB RAM
Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz 3.70 GHz, 32GB RAM
Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 1.50 GHz 16GB RAM

We did not record exact runtimes of our experiments as the runtimes for RTAC and SAC are the same (they do basically the same computation). On average an experiment on CartPole for all five seeds took 30-40 min. The experiments on LunarLander took much longer with 3-4h for all five seeds. To speed up the computation we ran the experiments on multiple cpu-cores simultaneously. This is already included in the runtimes given above, and therefore it is difficult to determine the runtime of a single experiment. In total, we ran 11 experiments on CartPole and 14 experiments on LunarLander resulting in a total runtime of multiple days.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
experiment_data		experiment_data
src		src
test		test
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real Time Reinforcement Learning

Installation

Create environment

Update environment (if env exists)

Activate environment

Basic Usage

Environment Wrappers

Agents

Experiments

About

Releases

Packages

Contributors 3

Languages

ymahlau/real_time_rl_project

Folders and files

Latest commit

History

Repository files navigation

Real Time Reinforcement Learning

Installation

Create environment

Update environment (if env exists)

Activate environment

Basic Usage

Environment Wrappers

Agents

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages