[RLlib] No Preprocessors (part 2). #18468

sven1977 · 2021-09-09T12:56:15Z

This PR was motivated in preparation for soon allowing individual observation components to be addressed by the trajectory view API, for example to enable frame-stacking for individual observation components within a complex observation space (Tuple|Dict). Also, soon soft-deprecating RLlib's Preprocessor API should increase transparency for the users and allow batched, model-based preprocessing of observations. Observations will arrive in the model exactly as they are returned by the env.

This is the second part of this (already merged) PR here:
#18367

New config key "_disable_preprocessor_api" (default=False). Set this to True to disable any preprocessing of your env's observations: All obs data from the env will then arrive in the model as-is. No default Preprocessors will then be used, such as TupleFlattening/DictFlattening, etc.., not even the NoPreprocessor class.
The main changes here are in the SimpleListCollector due to the fact that observations are now possibly nested dicts/tuples of data. Such data arriving in the collector is dm_tree.flattened and stored per-column in the collector: e.g.

obs: {a={aa=np.array((2,), float), ab=np.array((), int8)}, b=np.array((5,), float)}
stored in collector as: [np.array((2,), float), np.array((), int8)), np.array((5,), float)]

A new example script was added (and is per-PR tested) that demonstrates the functionality of preprocessor_pref=None using a deeply nested obs space RandomEnv.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

yangysc · 2021-09-10T04:40:09Z

If preprocessor_pref is None, there are two places for torch needed to be adjusted as well.

policy.py
if we call the compute_single_action function, the obs is not a TensorType anymore

ray/rllib/policy/policy.py

Lines 255 to 263 in dd096c8

    
           out = self.compute_actions( 
        
               [obs], 
        
               state_batch, 
        
               prev_action_batch=prev_action_batch, 
        
               prev_reward_batch=prev_reward_batch, 
        
               info_batch=info_batch, 
        
               episodes=episodes, 
        
               explore=explore, 
        
               timestep=timestep)

Since the obs now is a dict, we cannot use [obs] to add the dimension.
We can change it to

out = self.compute_actions(
        tree.map_structure(lambda s: s[None], obs),
        state_batch,
        prev_action_batch=prev_action_batch,
        prev_reward_batch=prev_reward_batch,
        info_batch=info_batch,
        episodes=episodes,
        explore=explore,
        timestep=timestep)

The compute_actions function in torch_policy.py should be adjusted as well in case obs_batch is a dict

ray/rllib/policy/torch_policy.py

Line 264 in dd096c8

SampleBatch.CUR_OBS: np.asarray(obs_batch),

yangysc · 2021-09-10T08:41:16Z

Another possible problem is the _multi_gpu_parallel_grad_calc in torch_policy.py.

It seems that for dict observations and without any preprocessor, the input sample_batches does not have the get_interceptor function to transform np.array to torch.Tensor.

The get_interceptor function disappears after the following line.

ray/rllib/policy/torch_policy.py

Line 938 in ae689ec

lock = threading.Lock()

…deprecate_preprocessors_part_2

sven1977 · 2021-09-22T17:20:41Z

Thanks for your comments @yangysc ! These were invaluable. I added another test case covering single action calculations (and training) for all frameworks with the no-preprocessor setting and a complex observation space. Seems to be solid enough now to give this a go :)

yangysc · 2021-09-23T01:43:59Z

Thanks for your comments @yangysc ! These were invaluable. I added another test case covering single action calculations (and training) for all frameworks with the no-preprocessor setting and a complex observation space. Seems to be solid enough now to give this a go :)

Very glad to help. Also, thanks for your excellent work.

…deprecate_preprocessors_part_2 # Conflicts: # rllib/policy/policy.py

wip.

81122a9

sven1977 requested a review from michaelzhiluo September 9, 2021 12:57

sven1977 assigned michaelzhiluo Sep 9, 2021

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 9, 2021

gjoliver mentioned this pull request Sep 9, 2021

[RLlib] SAC crashes on the env having the dict observation space #18418

Closed

andras-kth mentioned this pull request Sep 10, 2021

How to disable preprocessing for a policy? #8600

Closed

michaelzhiluo approved these changes Sep 22, 2021

View reviewed changes

sven1977 added 4 commits September 22, 2021 15:18

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

145905c

…deprecate_preprocessors_part_2

wip.

77dc84f

wip.

bc8384b

wip.

b1a9d4a

sven1977 added 4 commits September 23, 2021 10:05

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

d28665b

…deprecate_preprocessors_part_2 # Conflicts: # rllib/policy/policy.py

fixes.

a988d2a

wip.

0d7d22b

fix.

2d649ec

sven1977 merged commit 61a1274 into ray-project:master Sep 23, 2021

sven1977 deleted the poc_deprecate_preprocessors_part_2 branch June 2, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] No Preprocessors (part 2). #18468

[RLlib] No Preprocessors (part 2). #18468

sven1977 commented Sep 9, 2021 •

edited

Loading

yangysc commented Sep 10, 2021 •

edited

Loading

yangysc commented Sep 10, 2021 •

edited

Loading

sven1977 commented Sep 22, 2021

yangysc commented Sep 23, 2021

[RLlib] No Preprocessors (part 2). #18468

[RLlib] No Preprocessors (part 2). #18468

Conversation

sven1977 commented Sep 9, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

yangysc commented Sep 10, 2021 • edited Loading

yangysc commented Sep 10, 2021 • edited Loading

sven1977 commented Sep 22, 2021

yangysc commented Sep 23, 2021

sven1977 commented Sep 9, 2021 •

edited

Loading

yangysc commented Sep 10, 2021 •

edited

Loading

yangysc commented Sep 10, 2021 •

edited

Loading