Why "timesteps = horizon_length * max_epochs" ? #108

berttggg · 2023-08-15T09:45:44Z

berttggg
Aug 15, 2023

In skrl examples, the timestep are defined by the above equation, but why?

I thought is

timesteps = horizon_length * num_actors * max_epochs

Thank you in advance.

Answered by Toni-SM

Aug 18, 2023

Hi @berttggg

rl_games counts (for statistics) the timesteps of each parallel environment (num_actors) independently, while skrl does it for the whole task (where calling env.step(...) executes a timestep for all parallel environments in one go).

View full answer

Toni-SM · 2023-08-15T20:41:09Z

Toni-SM
Aug 15, 2023
Maintainer

Hi @berttggg

As indicated in the skrl documentation, the training configuration is mapped as far as possible from the rl_games configuration.

In rl_games, the training duration is basically defined by the max_epochs and horizon_length variables as shown (in an adapted and simplified form) in the following code:

while True:
    epoch_num += 1
    for n in range(self.horizon_length):
        ... # collect data (rollouts)
   ... # train the agent
   if epoch_num >= self.max_epochs:
       break

Reference:

0 replies

berttggg · 2023-08-18T09:43:49Z

berttggg
Aug 18, 2023
Author

Thank you so much so the explanation.

But I am confused.

In rl_games documentation:
timesteps = horizon_length * num_actors * max_epochs

So, the data size is governed by horizon_length * num_actors.

In skrl documentation:
timesteps = horizon_length * max_epochs

In skrl, the num_actors (num_envs) seem to be left out?

1 reply

Toni-SM Aug 18, 2023
Maintainer

Hi @berttggg

rl_games counts (for statistics) the timesteps of each parallel environment (num_actors) independently, while skrl does it for the whole task (where calling env.step(...) executes a timestep for all parallel environments in one go).

Answer selected by berttggg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why "timesteps = horizon_length * max_epochs" ? #108

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why "timesteps = horizon_length * max_epochs" ? #108

berttggg Aug 15, 2023

Replies: 2 comments · 1 reply

Toni-SM Aug 15, 2023 Maintainer

berttggg Aug 18, 2023 Author

Toni-SM Aug 18, 2023 Maintainer

berttggg
Aug 15, 2023

Replies: 2 comments 1 reply

Toni-SM
Aug 15, 2023
Maintainer

berttggg
Aug 18, 2023
Author

Toni-SM Aug 18, 2023
Maintainer