Skip to content

Commit

Permalink
fixed spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcoMeter committed Sep 10, 2024
1 parent e72f25c commit 4a46f71
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions cleanrl/ppo_trxl/ppo_trxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ def forward(self, values, keys, query, mask):
if mask is not None:
energy = energy.masked_fill(mask.unsqueeze(1).unsqueeze(1) == 0, float("-1e20")) # -inf causes NaN

# Normalize energy values and apply softmax to retreive the attention scores
# Normalize energy values and apply softmax to retrieve the attention scores
attention = torch.softmax(
energy / (self.embed_dim ** (1 / 2)), dim=3
) # attention shape: (N, heads, query_len, key_len)
Expand Down Expand Up @@ -387,7 +387,7 @@ def reconstruct_observation(self):
max_episode_steps = envs.envs[0].max_episode_steps
if max_episode_steps <= 0:
max_episode_steps = 1024 # Memory Gym envs have max_episode_steps set to -1
# Set transformer memory length to max episode steps if greather than max episode steps
# Set transformer memory length to max episode steps if greater than max episode steps
args.trxl_memory_length = min(args.trxl_memory_length, max_episode_steps)

agent = Agent(args, observation_space, action_space_shape, max_episode_steps).to(device)
Expand Down
2 changes: 1 addition & 1 deletion docs/rl-algorithms/ppo-trxl.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Most details are derived from [ppo.py](/rl-algorithms/ppo#ppopy). These are addi

1. The policy and value function share parameters.
2. Multi-head attention is implemented so that all heads share parameters.
3. Abolute positional encoding is used as default. Learned positional encodings are supported.
3. Absolute positional encoding is used as default. Learned positional encodings are supported.
4. Previously computed hidden states of the TrXL layers are cached and re-used for up to `trxl_memory_length`. Only 1 hidden state is computed anew.
5. TrXL layers adhere to pre-layer normalization.
6. Support for multi-discrete action spaces.
Expand Down

0 comments on commit 4a46f71

Please sign in to comment.