Add dict obs support for PPO #559

Andrew-Luo1 · 2024-11-26T15:39:49Z

Implementation of @kevinzakka's specification for dictionary observations. Dictionary-valued observations are useful in a lot of contexts: privileged critic inputs and policy inputs that mix pixels and state information to name two.

Behaviour

In the case of ndarray-valued observations, this PR makes no changes.
When you call ppo/train.py with a dict-valued observation with observation normalisation enabled, it is applied to obs['state'].
When you call any other training agent with a dict-valued observation, it raises a NotImplementedError. Supporting other agents would be future work.

Usage
An upcoming PR on pixels-based PPO training provides an example of how to train with dictionary observations. Essentially,

Ensure that env.step and env.reset return the observation in a dictionary form; see the example below.
Implement your environment's observation_size -> Union[int, Mapping[str, Tuple[int, ...]]] property.
Change the called network, the apply method and the dummy observation generation in make_policy_network and make_value_network
Implement your desired network to process the dictionary observation.

Example observation

obs = {
  'state': jnp.concat(data.qpos, data.qvel)
  'pixels/rgb': rgb/255.0
  'pixels/depth': depth/self._max_depth
}

Side-effects
AutoResetWrapper and ppo/losses.py needed to be slightly modified to support obs-dict training. These modifications do not appear to affect performance nor training (not shown), tested on the Franka Pick-up Cube task.

Obs Type	Commit f43727	PR
State	212118 SPS	213242 SPS
Dict	NA	212729 SPS

…training agents

…n your dict-valued obs

btaba

Thanks Andrew!

brax/envs/wrappers/training.py

btaba · 2024-11-26T19:49:05Z

brax/training/agents/ppo/losses.py

@@ -136,8 +137,13 @@ def compute_ppo_loss(

  baseline = value_apply(normalizer_params, params.value, data.observation)

+  if dict_obs:


Can this just be:

terminal_obs = jax.tree_util.tree_map(lambda x: x[-1], data.next_observation)

and get rid of dict_obs and the if statement?

btaba · 2024-11-26T19:52:47Z

brax/training/agents/ppo/train.py

@@ -251,7 +256,8 @@ def train(
      reward_scaling=reward_scaling,
      gae_lambda=gae_lambda,
      clipping_epsilon=clipping_epsilon,
-      normalize_advantage=normalize_advantage)
+      normalize_advantage=normalize_advantage,
+      dict_obs=not ndarray_obs)


let's try to get rid of dict_obs

btaba · 2024-11-26T19:54:04Z

brax/training/agents/ppo/train.py

@@ -231,12 +231,17 @@ def train(
  key_envs = jnp.reshape(key_envs,
                         (local_devices_to_use, -1) + key_envs.shape[1:])
  env_state = reset_fn(key_envs)
+  ndarray_obs = isinstance(env_state.obs, jnp.ndarray) # Check whether observations are in dictionary form.
+  if not ndarray_obs and normalize_observations:
+    assert "state" in env.observation_size, "Observation normalisation only supported for states."


It's ok to just have a KeyError below rather than a one-off input validation here. We should either do better input validation, or fail loudly below.

btaba · 2024-11-26T20:00:30Z

brax/training/agents/ppo/train.py

+  if not ndarray_obs and normalize_observations:
+    assert "state" in env.observation_size, "Observation normalisation only supported for states."
+
+  obs_shape = env_state.obs.shape[-1] if ndarray_obs else env.observation_size


Let's not rely on env.observation_size, I believe in many cases, that calls an env.step

switched to using jax.tree_util.tree_map(lambda x: x.shape, env_state.obs)

btaba · 2024-11-26T20:01:01Z

brax/training/agents/ppo/train.py

-        training_state.normalizer_params,
-        data.observation,
-        pmap_axis_name=_PMAP_AXIS_NAME)
+    if normalize_observations:


why do we need this if statement, and the one that was added below it?

For line 392, this handles the case when obs doesn't have the ['state'] key. Not sure if it makes sense for us to cover this case.

For the block under 331, there's nothing to update the normaliser with if 'state' isn't in the obs dict.

I think these if statements should be removed if the purpose is to validate whether "state" exists or not.

Let's fail with KeyError, as discussed previously, we don't want to fail silently in general

btaba · 2024-11-26T20:01:45Z

brax/training/types.py

@@ -79,7 +79,7 @@ class NetworkFactory(Protocol[NetworkType]):

  def __call__(
      self,
-      observation_size: int,
+      observation_size: Union[int, Mapping[str, Tuple[int, ...]]],


should this be Union[Tuple[int, ...], int] in the Mapping?

Yes there seems to be no reason to enforce having <=1D obs be tuple-sized

In the case of a dict, observation_size will always be Mapping[str, Tuple[int, ...]] the way I calculate it using tree_map, but in general having non-tuple sizes makes sense.

btaba · 2024-11-26T20:09:29Z

@Andrew-Luo1 how do AutoResetWrapper and losses need to be modified? We should probably apply those updates here as well?

btaba · 2024-11-27T00:35:04Z

brax/training/types.py

@@ -79,7 +79,7 @@ class NetworkFactory(Protocol[NetworkType]):

  def __call__(
      self,
-      observation_size: int,
+      observation_size: Union[int, Mapping[str, Union[Tuple[int, ...], int]]],


May as well make this ObservationSize like the other types above

btaba · 2024-11-27T00:36:53Z

brax/training/agents/ppo/train.py

+  ndarray_obs = isinstance(env_state.obs, jnp.ndarray) # Check whether observations are in dictionary form.
+
+  obs_shape = env_state.obs.shape[-1] if ndarray_obs \
+    else jax.tree_util.tree_map(lambda x: x.shape[2:], env_state.obs) # Discard batch axes.


Why do we have [2:] for the dict, buy [-1] for the jax.Array version? Presumably for pixel inputs?

Can they both be [2:] and then remove the if statement?

Can the comment be more explicit about why the first two dims are removed (one for num_envs, and one for num_devices, is that right?)

brax/training/agents/ppo/train.py

btaba · 2024-11-27T00:43:46Z

brax/training/agents/ppo/train.py

-        training_state.normalizer_params,
-        data.observation,
-        pmap_axis_name=_PMAP_AXIS_NAME)
+    if normalize_observations:


I think these if statements should be removed if the purpose is to validate whether "state" exists or not.

Let's fail with KeyError, as discussed previously, we don't want to fail silently in general

btaba · 2024-11-27T16:13:34Z

brax/envs/fast.py

@@ -28,6 +28,7 @@ def __init__(self, **kwargs):
    self._dt = 0.02
    self._reset_count = 0
    self._step_count = 0
+    self._dict_obs = kwargs.get('dict_obs', False)


nit: use_dict_obs

_dict_obs makes it seem like you're storing dictionary observations to self

brax/envs/base.py

btaba · 2024-11-27T16:19:14Z

brax/training/networks.py

@@ -15,14 +15,15 @@
 """Network definitions."""

 import dataclasses
-from typing import Any, Callable, Sequence, Tuple
+from typing import Any, Callable, Sequence, Tuple, Mapping


nit: keep these imports in order, here and elsewhere in the PR

btaba · 2024-11-27T16:21:53Z

brax/training/networks.py

 import warnings

 from brax.training import types
 from brax.training.spectral_norm import SNDense
 from flax import linen
 import jax
 import jax.numpy as jnp
+from jax.tree_util import tree_flatten


nit: generally we try to avoid importing members of modules directly

btaba · 2024-11-27T16:23:42Z

brax/training/networks.py

    preprocess_observations_fn: types.PreprocessObservationFn = types
    .identity_observation_preprocessor,
    hidden_layer_sizes: Sequence[int] = (256, 256),
    activation: ActivationFn = linen.relu,
    kernel_init: Initializer = jax.nn.initializers.lecun_uniform(),
    layer_norm: bool = False) -> FeedForwardNetwork:
-  """Creates a policy network."""
+  """Creates a policy network. Only processes state in the case of dict obs."""


nit: this comment is better suited where it's applied rather than in the top-level function docstring. So on L107

Similar comment for value_network below

brax/training/agents/ppo/train.py

btaba · 2024-11-27T16:25:20Z

brax/training/networks.py

@@ -82,48 +83,55 @@ def __call__(self, data: jnp.ndarray):
        hidden = self.activation(hidden)
    return hidden

+def canolicalize_obs_size(obs_size: types.ObservationSize) -> int:


nit: "canonical" can be a quite overloaded term, let's call this get_obs_state_size

btaba

Looks good, just a couple more nits

btaba · 2024-11-27T22:01:28Z

brax/envs/base.py


 from brax import base
 from brax.generalized import pipeline as g_pipeline
 from brax.io import image
 from brax.mjx import pipeline as m_pipeline
 from brax.positional import pipeline as p_pipeline
 from brax.spring import pipeline as s_pipeline
+from brax.training.types import ObservationSize


We try to keep brax.training pretty standalone from other parts of the lib, can we remove this import?

btaba · 2024-11-27T22:02:03Z

brax/envs/base.py

    rng = jax.random.PRNGKey(0)
    reset_state = self.unwrapped.reset(rng)
-    return reset_state.obs.shape[-1]
+    obs = reset_state.obs
+    if isinstance(obs, jax.Array) and len(obs.shape) == 1:


why do we need the len(obs.shape) == 1 ? just do what we had on the left hand side

Added clarifying comment. Does it make sense to have observation_size return a tuple for multi-dimensional obs?

btaba · 2024-11-27T22:02:15Z

brax/envs/base.py

+    obs = reset_state.obs
+    if isinstance(obs, jax.Array) and len(obs.shape) == 1:
+      return obs.shape[-1]
+    else:


nit:

remove the else and just return

btaba · 2024-11-27T22:03:43Z

brax/training/networks.py

@@ -100,30 +103,36 @@ def make_policy_network(
      layer_norm=layer_norm)

  def apply(processor_params, policy_params, obs):
+    obs = obs if isinstance(obs, jnp.ndarray) \


nit: avoid backslash, use (...)

btaba · 2024-11-27T22:31:25Z

brax/envs/base.py

    rng = jax.random.PRNGKey(0)
    reset_state = self.unwrapped.reset(rng)
-    return reset_state.obs.shape[-1]
+    obs = reset_state.obs
+    if isinstance(obs, jax.Array) and len(obs.shape) == 1:


Andrew-Luo1 added 2 commits November 26, 2024 09:20

add dict obs support for PPO and raise notImplementedError for other …

29a7407

…training agents

handle when you do not normalise observations and do not have state i…

e5f2969

…n your dict-valued obs

Andrew-Luo1 mentioned this pull request Nov 26, 2024

PPO on Pixels #560

Merged

btaba reviewed Nov 26, 2024

View reviewed changes

btaba self-assigned this Nov 26, 2024

clean-up, update obseration_size typing, avoid observation_size call

8f2a25c

btaba requested changes Nov 27, 2024

View reviewed changes

Andrew-Luo1 added 2 commits November 27, 2024 09:23

support the basic dict obs case in networks.py

47fbb6e

add test for dict obs

b4a0701

btaba reviewed Nov 27, 2024

View reviewed changes

btaba requested changes Nov 27, 2024

View reviewed changes

fix nits

82e0ac4

btaba requested changes Nov 27, 2024

View reviewed changes

fix nits

e0d0389

btaba requested changes Nov 27, 2024

View reviewed changes

assume ndarray obs are vectors

9e8ad23

btaba approved these changes Nov 27, 2024

View reviewed changes

btaba merged commit e615f42 into google:main Nov 27, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dict obs support for PPO #559

Add dict obs support for PPO #559

Andrew-Luo1 commented Nov 26, 2024

btaba left a comment

btaba Nov 26, 2024

btaba Nov 26, 2024

btaba Nov 26, 2024

btaba Nov 26, 2024

Andrew-Luo1 Nov 26, 2024

btaba Nov 26, 2024

Andrew-Luo1 Nov 26, 2024

btaba Nov 27, 2024

btaba Nov 26, 2024

Andrew-Luo1 Nov 26, 2024

Andrew-Luo1 Nov 26, 2024

btaba commented Nov 26, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba left a comment

btaba Nov 27, 2024 •

edited

Loading

btaba Nov 27, 2024

Andrew-Luo1 Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

btaba Nov 27, 2024

		@@ -136,8 +137,13 @@ def compute_ppo_loss(

		baseline = value_apply(normalizer_params, params.value, data.observation)

		if dict_obs:

Add dict obs support for PPO #559

Add dict obs support for PPO #559

Conversation

Andrew-Luo1 commented Nov 26, 2024

btaba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

btaba commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

btaba left a comment

Choose a reason for hiding this comment

btaba Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

btaba Nov 27, 2024 •

edited

Loading