Feature/buffer np (#169)

* Add first PPO numpy buffer implementation * Add distribution cfg to agent * No need for tensordict * Add SAC numpy * Improve sample_next_obs * Add DV1 with numpy buffer * Too much reshapes * Add Sequential and EnvIndipendent np buffers * Fewer number of reshapes * Faster indexing + from_numpy parameter * Dreamer-V2 numpy * Fix buffer add * Better indexing * Fix indexes to sample * Fix metrics when they are nan * Fix reshape when bootstrapping + fix normalization * Guard timer metrics * np.intp for indexing * Change dtype after creating the tensor * Fix buf[key] after __getstate__ is called upon checkpoint * Securely close fd on __getstate__() * Add MemmapArray * Add __len__ function * Fix len * Better array setter and __del__ now controls ownership * Do not transfer ownership upon array setter * Add properties * Feature/episode buffer np (#121) * feat: added episode buffer numpy * fix: memmap episode buffer numpy * fix: checkpoint when memmap=True EpisodeBufferNumpy * fix: memmap episode buffer np * tests: added tests for episode buffer np * feat: update episode buffer, added MemmapArray * Fix not use self._obs_keys * Sample only if n > 0 * Fix shapes * feat: added possibility to specify sequence length in sample() + added possibility to add data only to some env * tests: update episode buffer numpy tests * tests: added replay buffer np tests * tests: added sequential replay buffer np tests * fix: env independent repla buffer name * fix: replay buffer + add tests * Safely release buffer on Windows * Safely delets memmaps * Del buffer * Safer array setter * Add Memmap.from_array * Fix ReplayBuffer __set_item__ * fix: sac_np sample * tests: update tests * tests: update * fix: sequential replay buffer sample clone * Add tests + Fix MemmapArray on Windows * Add tests to run only on Linux * Fix tests * Fix skip test on Windows * Dreamer-V2 with EpisodeBuffer np * Add user warning if file exists when creating a new MemmapArray * feat: added dreamer v3 np * Add docstrings + Fix array setter if shapes differ * Fix tests * Add docstring * Docstrings * fix: sample of env independent buffer * Fix locked tensordict * Add configs * feat: update np algorithms with new specifications * fix: mypy * PokemonRed env from https://github.com/PWhiddy/PokemonRedExperiments/blob/master/baselines/red_gym_env.py * Update dreamer_v3 with main * Update dreamer_v2 with main * Update dreamer_v1 with main * Update ppo with main * Update sac with main * Amend numpy to torch dtype and back dicts * feat: added np callback * fix: np callback * feat: add support functions in np checkpoint callback * feat: added droq np * feat: added ppo recurrent np * feat: added sac-ae np * Update dreamer algos with main * feat: added p2e dv1 np * feat: added p2e dv2 np * feat: add p2e dv3 np * feat: added ppo decoupled np * feat: add sac decoupled * np.tanh instead of torch.tanh * feat: from tensordict to buffers np * from td to np * exclude mlflow from tests * No more tensordict * Updated howto * Fix tests * .cpu().numpy() just one time * Removed old cfgs * Convert all when hydra instantiating * convert all on instantiate * [skip-ci] Removed pokemon files * fix: git merge related errors * Fix get absolute path * Amend dreamer-v3 pokemon config --------- Co-authored-by: michele-milesi <74559684+michele-milesi@users.noreply.github.com> Co-authored-by: Michele Milesi <michele.milesi@studio.unibo.it>
Eclectic-Sheep · Dec 19, 2023 · 6e5b31d · 6e5b31d
1 parent bde9365
commit 6e5b31d
Show file tree

Hide file tree

Showing 42 changed files with 3,372 additions and 1,828 deletions.
diff --git a/.gitignore b/.gitignore
@@ -170,4 +170,5 @@ pytest_*
 .pypirc
 mlruns
 mlartifacts
-examples/models
+examples/models
+session_*
diff --git a/README.md b/README.md
@@ -358,15 +358,15 @@ For each algorithm, losses are kept in a separate module, so that their implemen
 
 ## :card_index_dividers: Buffer
 
-For the buffer implementation, we choose to use a wrapper around a [TensorDict](https://pytorch.org/rl/tensordict/reference/generated/tensordict.TensorDict.html).
+For the buffer implementation, we choose to use a wrapper around a dictionary of Numpy arrays.
 
-TensorDict comes in handy since we can easily add custom fields to the buffer as if we are working with dictionaries, but we can also easily perform operations on them as if we are working with tensors.
+To enable a simple way to work with numpy memory-mapped arrays, we implemented the `sheeprl.utils.memmap.MemmapArray`, a container that handles the memory-mapped arrays.
 
-This flexibility makes it very simple to implement, with the classes `ReplayBuffer`, `SequentialReplayBuffer`, `EpisodeBuffer`, and `AsyncReplayBuffer`, all the buffers needed for on-policy and off-policy algorithms.
+This flexibility makes it very simple to implement, with the classes `ReplayBuffer`, `SequentialReplayBuffer`, `EpisodeBuffer`, and `EnvIndependentReplayBuffer`, all the buffers needed for on-policy and off-policy algorithms.
 
 ### :mag: Technical details
 
-The tensor's shape in the TensorDict is `(T, B, *)`, where `T` is the number of timesteps, `B` is the number of parallel environments, and `*` is the shape of the data.
+The shape of the Numpy arrays in the dictionary is `(T, B, *)`, where `T` is the number of timesteps, `B` is the number of parallel environments, and `*` is the shape of the data.
 
 For the `ReplayBuffer` to be used as a RolloutBuffer, the proper `buffer_size` must be specified. For example, for PPO, the `buffer_size` must be `[T, B]`, where `T` is the number of timesteps and `B` is the number of parallel environments.