Skip to content

Commit

Permalink
Merge pull request #36 from allenai/allenact_0.5.0
Browse files Browse the repository at this point in the history
2022 Leaderboards and Embodied CLIP Model
  • Loading branch information
Lucaweihs authored Mar 25, 2022
2 parents 64d8b57 + 9b6ee01 commit 9b58a1f
Show file tree
Hide file tree
Showing 23 changed files with 163 additions and 88 deletions.
36 changes: 22 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,8 +229,11 @@ be used for evaluation.

We are tracking challenge participant entries using the [AI2 Leaderboard](https://leaderboard.allenai.org/). The team with the best submission made to either of the below leaderboards by May 31st (midnight, [anywhere on earth](https://time.is/Anywhere_on_Earth)) will be announced at the [CVPR'21 Embodied-AI Workshop](https://embodied-ai.org/) and invited to produce a video describing their approach.

**Submission leaderboard links will be announced soon (late Feb 2022). Please check back here.** Our 2021
leaderboard links can be found [here](https://leaderboard.allenai.org/ithor_rearrangement_1phase) and [here](https://leaderboard.allenai.org/ithor_rearrangement_2phase). Note
In particular, our 2022 leaderboard links can be found at
* [**2022 1-phase leaderboard**](https://leaderboard.allenai.org/ithor_rearrangement_1phase_2022) and
* [**2022 1-phase leaderboard**](https://leaderboard.allenai.org/ithor_rearrangement_2phase_2022).

Our older (2021) leaderboards are also available indefinitely ([previous 2021 1-phase leaderboard](https://leaderboard.allenai.org/ithor_rearrangement_1phase), [previous 2021 2-phase leaderboard]](https://leaderboard.allenai.org/ithor_rearrangement_1phase)) Note
that our 2021 challenge uses a different dataset and older version of AI2-THOR and so results will not be
directly comparable.

Expand Down Expand Up @@ -531,18 +534,23 @@ allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_p

### 💪 Pretrained Models

We currently provide the following pretrained models (see [our paper](https://arxiv.org/abs/2103.16544) for details
on these models):

| Model | % Fixed Strict (Test, on 2021 dataset) | Pretrained Model |
|------------|:--------------------------------------:|:----------:|
| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) | 8.9% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) | 6.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) | 5.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) | 4.8% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) | 4.6% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) | 1.44% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) | 0.66% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |
In the below table we provide a collection of pretrained models from:

1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544), and
2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance acros embodied tasks](https://arxiv.org/abs/2111.09888).

We have only evaluated a subset of these models on our 2022 dataset.

| Model | % Fixed Strict (2022 dataset, test) | % Fixed Strict (2021 dataset, test) | Pretrained Model |
|------------|:-----------------------------------:|:-----------------------------------:|:----------:|
| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) | **19.1%** | **17.3%** | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) | - | 8.9% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) | - | 6.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) | - | 5.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) | - | 4.8% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) | - | 4.6% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) | **0.53%** | **1.44%** | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) | - | 0.66% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |

These models can be downloaded at from the above links and should be placed into the `pretrained_model_ckpts` directory.
You can then, for example, run inference for the _1-Phase ResNet18 IL_ model using AllenAct by running:
Expand Down
54 changes: 38 additions & 16 deletions baseline_configs/one_phase/one_phase_rgb_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
from allenact.base_abstractions.sensor import SensorSuite, Sensor

try:
from allenact.embodiedai.sensors.vision_sensors import DepthSensor
from allenact.embodiedai.sensors.vision_sensors import (
DepthSensor,
IMAGENET_RGB_MEANS,
IMAGENET_RGB_STDS,
)
except ImportError:
raise ImportError("Please update to allenact>=0.4.0.")

Expand All @@ -17,20 +21,38 @@


class OnePhaseRGBBaseExperimentConfig(RearrangeBaseExperimentConfig, ABC):
SENSORS = [
RGBRearrangeSensor(
height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
use_resnet_normalization=True,
uuid=RearrangeBaseExperimentConfig.EGOCENTRIC_RGB_UUID,
),
UnshuffledRGBRearrangeSensor(
height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
use_resnet_normalization=True,
uuid=RearrangeBaseExperimentConfig.UNSHUFFLED_RGB_UUID,
),
]
@classmethod
def sensors(cls) -> Sequence[Sensor]:
cnn_type, pretraining_type = cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING
if pretraining_type.strip().lower() == "clip":
from allenact_plugins.clip_plugin.clip_preprocessors import (
ClipResNetPreprocessor,
)

mean = ClipResNetPreprocessor.CLIP_RGB_MEANS
stdev = ClipResNetPreprocessor.CLIP_RGB_STDS
else:
mean = IMAGENET_RGB_MEANS
stdev = IMAGENET_RGB_STDS

return [
RGBRearrangeSensor(
height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
use_resnet_normalization=True,
uuid=RearrangeBaseExperimentConfig.EGOCENTRIC_RGB_UUID,
mean=mean,
stdev=stdev,
),
UnshuffledRGBRearrangeSensor(
height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
use_resnet_normalization=True,
uuid=RearrangeBaseExperimentConfig.UNSHUFFLED_RGB_UUID,
mean=mean,
stdev=stdev,
),
]

@classmethod
def make_sampler_fn(
Expand All @@ -47,7 +69,7 @@ def make_sampler_fn(
**kwargs,
) -> RearrangeTaskSampler:
"""Return a RearrangeTaskSampler."""
sensors = cls.SENSORS if sensors is None else sensors
sensors = cls.sensors() if sensors is None else sensors
if "mp_ctx" in kwargs:
del kwargs["mp_ctx"]
assert not cls.RANDOMIZE_START_ROTATION_DURING_TRAINING
Expand Down
12 changes: 12 additions & 0 deletions baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from baseline_configs.one_phase.one_phase_rgb_il_base import (
OnePhaseRGBILBaseExperimentConfig,
)


class OnePhaseRGBClipResNet50DaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN50", "clip")
IL_PIPELINE_TYPE = "40proc"

@classmethod
def tag(cls) -> str:
return f"OnePhaseRGBClipResNet50Dagger_{cls.IL_PIPELINE_TYPE}"
2 changes: 1 addition & 1 deletion baseline_configs/one_phase/one_phase_rgb_dagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


class OnePhaseRGBDaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
USE_RESNET_CNN = False
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = None
IL_PIPELINE_TYPE = "40proc"

@classmethod
Expand Down
14 changes: 8 additions & 6 deletions baseline_configs/one_phase/one_phase_rgb_il_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import torch

from allenact.algorithms.onpolicy_sync.losses.imitation import Imitation
from allenact.base_abstractions.sensor import ExpertActionSensor
from allenact.base_abstractions.sensor import ExpertActionSensor, Sensor
from allenact.utils.experiment_utils import PipelineStage
from allenact.utils.misc_utils import all_unique
from baseline_configs.one_phase.one_phase_rgb_base import (
Expand Down Expand Up @@ -85,13 +85,15 @@ def il_training_params(label: str, training_steps: int):


class OnePhaseRGBILBaseExperimentConfig(OnePhaseRGBBaseExperimentConfig):
SENSORS = [
*OnePhaseRGBBaseExperimentConfig.SENSORS,
ExpertActionSensor(len(RearrangeBaseExperimentConfig.actions())),
]

IL_PIPELINE_TYPE: Optional[str] = None

@classmethod
def sensors(cls) -> Sequence[Sensor]:
return [
*super(OnePhaseRGBILBaseExperimentConfig, cls).sensors(),
ExpertActionSensor(len(RearrangeBaseExperimentConfig.actions())),
]

@classmethod
def _training_pipeline_info(cls, **kwargs) -> Dict[str, Any]:
"""Define how the model trains."""
Expand Down
2 changes: 1 addition & 1 deletion baseline_configs/one_phase/one_phase_rgb_ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@


class OnePhaseRGBPPOExperimentConfig(OnePhaseRGBBaseExperimentConfig):
USE_RESNET_CNN = False
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = None

@classmethod
def tag(cls) -> str:
Expand Down
2 changes: 1 addition & 1 deletion baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


class OnePhaseRGBResNetDaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
USE_RESNET_CNN = True
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN18", "imagenet")
IL_PIPELINE_TYPE = "40proc"

@classmethod
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import os
from typing import Sequence

import gym
import torch
from torch import nn

from allenact.base_abstractions.sensor import SensorSuite
from allenact.base_abstractions.sensor import SensorSuite, Sensor
from allenact.embodiedai.mapping.mapping_models.active_neural_slam import (
ActiveNeuralSLAM,
)
Expand All @@ -27,7 +28,9 @@
class OnePhaseRGBResNetFrozenMapDaggerExperimentConfig(
OnePhaseRGBILBaseExperimentConfig
):
USE_RESNET_CNN = False # Not necessary as we're handling things in the model
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = (
None # Not necessary as we're handling things in the model
)
IL_PIPELINE_TYPE = "40proc"

ORDERED_OBJECT_TYPES = list(sorted(PICKUPABLE_OBJECTS + OPENABLE_OBJECTS))
Expand All @@ -43,10 +46,11 @@ class OnePhaseRGBResNetFrozenMapDaggerExperimentConfig(
resolution_in_cm=5,
)

SENSORS = OnePhaseRGBILBaseExperimentConfig.SENSORS + [
RelativePositionChangeTHORSensor(),
MAP_RANGE_SENSOR,
]
@classmethod
def sensors(cls) -> Sequence[Sensor]:
return list(
super(OnePhaseRGBResNetFrozenMapDaggerExperimentConfig, cls).sensors()
) + [RelativePositionChangeTHORSensor(), cls.MAP_RANGE_SENSOR,]

@classmethod
def tag(cls) -> str:
Expand All @@ -63,7 +67,7 @@ def create_model(cls, **kwargs) -> nn.Module:
)

observation_space = (
SensorSuite(cls.SENSORS).observation_spaces
SensorSuite(cls.sensors()).observation_spaces
if kwargs.get("sensor_preprocessor_graph") is None
else kwargs["sensor_preprocessor_graph"].observation_spaces
)
Expand Down
2 changes: 1 addition & 1 deletion baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


class OnePhaseRGBResNetPPOExperimentConfig(OnePhaseRGBPPOExperimentConfig):
USE_RESNET_CNN = True
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN18", "imagenet")

@classmethod
def tag(cls) -> str:
Expand Down
67 changes: 48 additions & 19 deletions baseline_configs/rearrange_base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import copy
import platform
from abc import abstractmethod
from typing import Optional, List, Sequence, Dict, Any
from typing import Optional, List, Sequence, Dict, Any, Tuple

import ai2thor.platform
import gym.spaces
Expand Down Expand Up @@ -64,7 +64,7 @@ class RearrangeBaseExperimentConfig(ExperimentConfig):
# Training parameters
TRAINING_STEPS = int(75e6)
SAVE_INTERVAL = int(1e6)
USE_RESNET_CNN = False
CNN_PREPROCESSOR_TYPE_AND_PRETRAINING: Optional[Tuple[str, str]] = None

# Sensor info
SENSORS: Optional[Sequence[Sensor]] = None
Expand Down Expand Up @@ -93,6 +93,10 @@ class RearrangeBaseExperimentConfig(ExperimentConfig):
)
)

@classmethod
def sensors(cls) -> Sequence[Sensor]:
return cls.SENSORS

@classmethod
def actions(cls):
other_move_actions = (
Expand All @@ -119,24 +123,50 @@ def actions(cls):
@classmethod
def resnet_preprocessor_graph(cls, mode: str) -> SensorPreprocessorGraph:
def create_resnet_builder(in_uuid: str, out_uuid: str):
return ResNetPreprocessor(
input_height=cls.THOR_CONTROLLER_KWARGS["height"],
input_width=cls.THOR_CONTROLLER_KWARGS["width"],
output_width=7,
output_height=7,
output_dims=512,
pool=False,
torchvision_resnet_model=torchvision.models.resnet18,
input_uuids=[in_uuid],
output_uuid=out_uuid,
)
cnn_type, pretraining_type = cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING
if pretraining_type == "imagenet":
assert cnn_type in [
"RN18",
"RN50",
], "Only allow using RN18/RN50 with `imagenet` pretrained weights."
return ResNetPreprocessor(
input_height=cls.THOR_CONTROLLER_KWARGS["height"],
input_width=cls.THOR_CONTROLLER_KWARGS["width"],
output_width=7,
output_height=7,
output_dims=512 if "18" in cnn_type else 2048,
pool=False,
torchvision_resnet_model=getattr(
torchvision.models, f"resnet{cnn_type.replace('RN', '')}"
),
input_uuids=[in_uuid],
output_uuid=out_uuid,
)
elif pretraining_type == "clip":
from allenact_plugins.clip_plugin.clip_preprocessors import (
ClipResNetPreprocessor,
)
import clip

# Let's make sure we download the clip model now
# so we don't download it on every spawned process
clip.load(cnn_type, "cpu")

return ClipResNetPreprocessor(
rgb_input_uuid=in_uuid,
clip_model_type=cnn_type,
pool=False,
output_uuid=out_uuid,
)
else:
raise NotImplementedError

img_uuids = [cls.EGOCENTRIC_RGB_UUID, cls.UNSHUFFLED_RGB_UUID]
return SensorPreprocessorGraph(
source_observation_spaces=SensorSuite(
[
sensor
for sensor in cls.SENSORS
for sensor in cls.sensors()
if (mode == "train" or not isinstance(sensor, ExpertActionSensor))
]
).observation_spaces,
Expand Down Expand Up @@ -194,7 +224,7 @@ def machine_params(cls, mode="train", **kwargs) -> MachineParams:
devices=devices,
sampler_devices=sampler_devices,
sensor_preprocessor_graph=cls.resnet_preprocessor_graph(mode=mode)
if cls.USE_RESNET_CNN
if cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING is not None
else None,
)

Expand Down Expand Up @@ -255,7 +285,6 @@ def stagewise_task_sampler_args(
thor_platform: Optional[ai2thor.platform.BaseLinuxPlatform] = None
if platform.system() == "Linux":
try:
raise IOError
x_displays = get_open_x_displays(throw_error_if_empty=True)

if devices is not None and len(
Expand Down Expand Up @@ -289,7 +318,7 @@ def stagewise_task_sampler_args(
},
}

sensors = kwargs.get("sensors", copy.deepcopy(cls.SENSORS))
sensors = kwargs.get("sensors", copy.deepcopy(cls.sensors()))
kwargs["sensors"] = sensors

sem_sensor = next(
Expand Down Expand Up @@ -452,10 +481,10 @@ def training_pipeline(cls, **kwargs) -> TrainingPipeline:

@classmethod
def create_model(cls, **kwargs) -> nn.Module:
if not cls.USE_RESNET_CNN:
if cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING is None:
return RearrangeActorCriticSimpleConvRNN(
action_space=gym.spaces.Discrete(len(cls.actions())),
observation_space=SensorSuite(cls.SENSORS).observation_spaces,
observation_space=SensorSuite(cls.sensors()).observation_spaces,
rgb_uuid=cls.EGOCENTRIC_RGB_UUID,
unshuffled_rgb_uuid=cls.UNSHUFFLED_RGB_UUID,
)
Expand Down
Loading

0 comments on commit 9b58a1f

Please sign in to comment.