Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…

… feature/a2c
Eclectic-Sheep · Dec 19, 2023 · f1327fc · f1327fc
2 parents fd1f334 + 6e5b31d
commit f1327fc
Show file tree

Hide file tree

Showing 54 changed files with 3,461 additions and 1,888 deletions.
diff --git a/.gitignore b/.gitignore
@@ -170,4 +170,5 @@ pytest_*
 .pypirc
 mlruns
 mlartifacts
-examples/models
+examples/models
+session_*
diff --git a/README.md b/README.md
@@ -358,15 +358,15 @@ For each algorithm, losses are kept in a separate module, so that their implemen
 
 ## :card_index_dividers: Buffer
 
-For the buffer implementation, we choose to use a wrapper around a [TensorDict](https://pytorch.org/rl/tensordict/reference/generated/tensordict.TensorDict.html).
+For the buffer implementation, we choose to use a wrapper around a dictionary of Numpy arrays.
 
-TensorDict comes in handy since we can easily add custom fields to the buffer as if we are working with dictionaries, but we can also easily perform operations on them as if we are working with tensors.
+To enable a simple way to work with numpy memory-mapped arrays, we implemented the `sheeprl.utils.memmap.MemmapArray`, a container that handles the memory-mapped arrays.
 
-This flexibility makes it very simple to implement, with the classes `ReplayBuffer`, `SequentialReplayBuffer`, `EpisodeBuffer`, and `AsyncReplayBuffer`, all the buffers needed for on-policy and off-policy algorithms.
+This flexibility makes it very simple to implement, with the classes `ReplayBuffer`, `SequentialReplayBuffer`, `EpisodeBuffer`, and `EnvIndependentReplayBuffer`, all the buffers needed for on-policy and off-policy algorithms.
 
 ### :mag: Technical details
 
-The tensor's shape in the TensorDict is `(T, B, *)`, where `T` is the number of timesteps, `B` is the number of parallel environments, and `*` is the shape of the data.
+The shape of the Numpy arrays in the dictionary is `(T, B, *)`, where `T` is the number of timesteps, `B` is the number of parallel environments, and `*` is the shape of the data.
 
 For the `ReplayBuffer` to be used as a RolloutBuffer, the proper `buffer_size` must be specified. For example, for PPO, the `buffer_size` must be `[T, B]`, where `T` is the number of timesteps and `B` is the number of parallel environments.