Zero score on Freeway #23

emailweixu · 2022-04-13T01:26:39Z

I tried to run the code for Atari Freeway using the following command with the default settings in the code:

python main.py --env FreewayNoFrameskip-v4 \
--case atari \
--opr train \
--amp_type torch_amp \
--num_gpus 1 \
--num_cpus 10 \
--cpu_actor 2 \
--gpu_actor 2 \
--force \
--object_store_memory 21474836480 \
--seed=0

I tried two seeds 0 and 1. Based on tensorboard curves, the algorithm seems to receive no reward at all for training. Both workers.ori_reward and Train_statistics.target_value_prefix_mean are constant zero from beginning to the end.

From train_test_log, seed 0 got positive reward (~7.5) at step 0, but then no reward at all after that. Seed 1 also got ~7.5 reward at step 0, while got 0 for the remaining half of the evaluations. The other half got 21.34.

I wonder whether I did something wrong.

Thanks

Wei

rPortelas · 2022-05-04T15:19:04Z

Strengthening the relevance of @emailweixu reproducibility issue

Here are my performance results on Freeway, 4 seeds:

The 4 seeds obtained a score of 0 by the end of training, however 1 seed did manage to reacher 21.5 reward at some points during training.

I used the provided train.sh script (so 4gpus), with the following modifications to fit my setup: I used "--object_store_memory 100000000000" and "--num_cpus 80", which should not impact performance.

This issue is related to issue #21 , which points out another reproducibility issue. See issue #21 for potential reasons.

Best,
Rémy

emailweixu · 2022-05-04T17:27:10Z

@rPortelas Actually, I have reasons to believe that zero score for Freeway is expected. If you play Freeway yourself, you can see that it needs consistent exploration for one direction (UP) for many steps in order to get any reward. However, for the current implementation of EfficientZero, the behavior policy is a stochastic policy based on MCTS result. And at the beginning of training, the policy from MCTS is close to uniform given how EfficientZero is initialized (i.e. zero initialization for last layer of prediction nets), which makes it very hard to consistently go UP. Other algorithms such as CURL or SPR uses a greedy policy (coupled with noisy net) and are more likely to have consistent exploration behavior.

rPortelas · 2022-05-05T13:02:19Z

@emailweixu It is true that Freeway is challenging in terms of exploration, however in both the EfficientMuzero paper and the original Muzero paper (check Table S1 in appendix), non-zero performance improvements are reported. So we should be able to reproduce it.

emailweixu · 2022-05-05T19:33:47Z

@rPortelas I know both EfficientZero and MuZero reported reasonable performance on Freeway. The original MuZero is not opensourced so I cannot re-run the experiments and cannot know for sure. But since it trained on much more frames (20B frames), it is more likely to be able to obtain reward though random exploration. Furthermore, the original MuZero paper didn't describe how the weights of the models are initialized, it is possible that non-zero initialization of the last prediction layer can get some reward (non-zero initialization can make the initial policy not uniformly random). In fact, I did try non-zero initialization with EfficientZero (change init_zero to False from True), it did get some reward during the training, but the final performance is still much lower than the reported number. But zero initialization is explicitly described by EfficientZero in A.1.

szrlee · 2022-06-22T07:51:33Z

Thanks for the discussion!
Any follow-up message so far?

emailweixu · 2022-06-22T23:13:01Z

@rPortelas did you try the "raw" version you mentioned in #21 on Freeway?

rPortelas mentioned this issue May 4, 2022

reproduce the result of CrazyClimber #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero score on Freeway #23

Zero score on Freeway #23

emailweixu commented Apr 13, 2022

rPortelas commented May 4, 2022

emailweixu commented May 4, 2022

rPortelas commented May 5, 2022

emailweixu commented May 5, 2022

szrlee commented Jun 22, 2022

emailweixu commented Jun 22, 2022

Zero score on Freeway #23

Zero score on Freeway #23

Comments

emailweixu commented Apr 13, 2022

rPortelas commented May 4, 2022

Strengthening the relevance of @emailweixu reproducibility issue

emailweixu commented May 4, 2022

rPortelas commented May 5, 2022

emailweixu commented May 5, 2022

szrlee commented Jun 22, 2022

emailweixu commented Jun 22, 2022