Merge pull request #31 from allenai/2022-challenge-v0

2022 Challenge
allenai · Feb 15, 2022 · 79068f0 · 79068f0
2 parents bac3ba9 + 7245778
commit 79068f0
Show file tree

Hide file tree

Showing 23 changed files with 356 additions and 142 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 <h1 align="left">
-    AI2-THOR Rearrangement Challenge
+    2022 AI2-THOR Rearrangement Challenge
 </h1>
 
 <p align="left">
@@ -29,8 +29,8 @@
 
 <img src="https://ai2thor.allenai.org/static/4844ccdba50de95a4feff30cf2978ce5/3ba25/rearrangement-cover1.png" />
 
-Welcome to the 2021 AI2-THOR Rearrangement Challenge hosted at the
-[CVPR'21 Embodied-AI Workshop](https://embodied-ai.org/).
+Welcome to the 2022 AI2-THOR Rearrangement Challenge hosted at the
+[CVPR'22 Embodied-AI Workshop](https://embodied-ai.org/).
 The goal of this challenge is to build a model/agent that move objects in a room
 to restore them to a given initial configuration. Please follow the instructions below
 to get started.
@@ -50,6 +50,7 @@ with open("README.md", "r") as f:
 -->
 <div class="toc">
 <ul>
+<li><a href="#-whats-new-in-the-2022-challenge">🔥🆕🔥 What's New in the 2022 Challenge?</a></li>
 <li><a href="#-installation">💻 Installation</a></li>
 <li><a href="#-rearrangement-task-description">📝 Rearrangement Task Description</a></li>
 <li><a href="#-challenge-tracks-and-datasets">🛤️ Challenge Tracks and Datasets</a><ul>
@@ -85,6 +86,19 @@ with open("README.md", "r") as f:
 </ul>
 </div>
 
+## 🔥🆕🔥 What's New in the 2022 Challenge?
+
+Our 2022 AI2-THOR Rearrangement Challenge has several upgrades distinguishing it from the 2021 version:
+1. **New AI2-THOR version.** We've upgraded the version of AI2-THOR we're using from 2.1.0 to 4.1.0, this brings:
+    * Performance improvements
+    * The ability to use (the recently announced) headless rendering feature, see 
+      [here](https://ai2thor.allenai.org/ithor/documentation/#headless-setup) this makes it much easier to run
+      AI2-THOR on shared servers where you may not have the admin privileges to start an X-server.
+2. **New dataset.** We've released a new rearrangement dataset to match the new AI2-THOR version. This new dataset
+    has a more uniform balance of easy/hard episodes.
+3. **Misc. improvements.** We've fixed a number of minor bugs and performance issues from the 2021 challenge improving
+   consistency.
+
 ## 💻 Installation
 
 To begin, clone this repository locally
@@ -157,13 +171,11 @@ a local `./src` directory. By explicitly specifying the `PIP_SRC` variable we ca
 
 **Python 3.6+ 🐍.** Each of the actions supports `typing` within <span class="chillMono">Python</span>.
 
-**AI2-THOR 2.7.2 🧞.** To ensure reproducible results, we're restricting all users to use the exact same version of <span class="chillMono">AI2-THOR</span>.
+**AI2-THOR 4.1.0 🧞.** To ensure reproducible results, we're restricting all users to use the exact same version of <span class="chillMono">AI2-THOR</span>.
 
 **AllenAct 🏋💪.** We ues the <span class="chillMono">AllenAct</span> reinforcement learning framework 
     for generating baseline models, baseline training pipelines, and for several of their helpful abstractions/utilities.
 
-**SciPy 🧑‍🔬.** We utilize <span class="chillMono">SciPy</span> for evaluation. It helps calculate the IoU between 3D bounding boxes.
-
 ## 📝 Rearrangement Task Description
 
 <img src="https://ai2thor.allenai.org/static/0f682c0103df1060810ad214c4668718/06655/rearrange-cover2.jpg" alt="Object Rearrangement Example" width="100%">
@@ -192,34 +204,35 @@ For this 2021 challenge we have two distinct tracks:
 
 ### 📊 Datasets
 
-For this challenge we have four distinct dataset splits: `"train"`, `"train_unseen"`, `"val"`, and `"test"`.
-The `train` and `train_unseen` splits use floor plans 1-20, 200-220, 300-320, and 400-420 within AI2-THOR,
+For this challenge we have three dataset splits: `"train"`, `"val"`, and `"test"`.
+The `train` split uses floor plans 1-20, 200-220, 300-320, and 400-420 within AI2-THOR,
 the `"val"` split uses floor plans 21-25, 221-225, 321-325, and 421-425, and finally the `"test"` split uses
 scenes 26-30, 226-230, 326-330, and 426-430. These dataset splits are stored as the compressed [pickle](https://docs.python.org/3/library/pickle.html)-serialized files
 `data/*.pkl.gz`. While you are freely (and encouraged) to enhance the training set as you see fit, you should
 never train your agent within any of the test scenes.
 
 For evaluation, your model will need to be evaluated on each of the above splits and the results
-submitted to our leaderboard link (see section below). As the `"train"` and `"train_unseen"` sets
+submitted to our leaderboard link (see section below). As the `"train"` set is
 are quite large, we do not expect you to evaluate on their entirety. Instead we select ~1000 datapoints
 from each of these sets for use in evaluation. For convenience, we provide the `data/combined.pkl.gz`
-file which contains the `"train"`, `"train_unseen"`, `"val"`, and `"test"` datapoints that should
+file which contains the `"train"`, `"val"`, and `"test"` datapoints that should
 be used for evaluation.
 
 | Split        | # Total Episodes | # Episodes for Eval | Path |
-| ------------ |:-----:|-----|-----|
-| train        | 4000 | 1200 | `data/train.pkl.gz`|
-| train_unseen | 3800 | 1140 | `data/train_unseen.pkl.gz`|
-| val          | 1000 | 1000 | `data/val.pkl.gz` | 
-| test         | 1000 | 1000 | `data/test.pkl.gz` |
-| combined     | 4340 | 4340 | `data/combined.pkl.gz` |
+| ------------ |:----------------:|---------------------|-----|
+| train        |       4000       | 800                 | `data/train.pkl.gz`|
+| val          |       1000       | 1000                | `data/val.pkl.gz` | 
+| test         |       1000       | 1000                | `data/test.pkl.gz` |
+| combined     |       2800       | 2800                | `data/combined.pkl.gz` |
 
 ## 🛤️ Submitting to the Leaderboard
 
 We are tracking challenge participant entries using the [AI2 Leaderboard](https://leaderboard.allenai.org/). The team with the best submission made to either of the below leaderboards by May 31st (midnight, [anywhere on earth](https://time.is/Anywhere_on_Earth)) will be announced at the [CVPR'21 Embodied-AI Workshop](https://embodied-ai.org/) and invited to produce a video describing their approach.
 
-Submissions can be made to the 1-phase leaderboard [here](https://leaderboard.allenai.org/ithor_rearrangement_1phase)
-and submissions to the 2-phase leaderboard can be made [here](https://leaderboard.allenai.org/ithor_rearrangement_2phase).
+**Submission leaderboard links will be announced soon (late Feb 2022). Please check back here.** Our 2021
+leaderboard links can be found [here](https://leaderboard.allenai.org/ithor_rearrangement_1phase) and [here](https://leaderboard.allenai.org/ithor_rearrangement_2phase). Note
+that our 2021 challenge uses a different dataset and older version of AI2-THOR and so results will not be 
+directly comparable.
 
 Submissions should include your agent's trajectories for all tasks contained within the [combined.pkl.gz](data/combined.pkl.gz)
 dataset, this "combined" dataset includes tasks for the train, train_unseen, validation, and test sets. For an example
@@ -521,15 +534,15 @@ allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_p
 We currently provide the following pretrained models (see [our paper](https://arxiv.org/abs/2103.16544) for details
 on these models):
 
-| Model | % Fixed Strict (Test) | Pretrained Model |
-|------------|:----------:|:----------:|
-| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) | 8.9% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
-| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) | 6.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
-| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) | 5.3%| [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
-| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) | 4.8% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
-| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) | 4.6% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
-| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) | 1.44% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
-| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) | 0.66% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |
+| Model | % Fixed Strict (Test, on 2021 dataset) | Pretrained Model |
+|------------|:--------------------------------------:|:----------:|
+| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) |                  8.9%                  | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
+| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) |                  6.3%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
+| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) |                  5.3%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
+| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) |                  4.8%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
+| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) |                  4.6%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
+| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) |                 1.44%                  | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
+| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) |                 0.66%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |
 
 These models can be downloaded at from the above links and should be placed into the `pretrained_model_ckpts` directory.
 You can then, for example, run inference for the _1-Phase ResNet18 IL_ model using AllenAct by running:
@@ -541,12 +554,12 @@ allenact baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py \
 --eval
 ```
 this will evaluate this model across all datapoints in the `data/combined.pkl.gz` dataset
-which contains data from the `train`, `train_unseen`, `val`, and `test` sets so that
+which contains data from the `train`, `val`, and `test` sets so that
 evaluation doesn't have to be run on each set separately.
 
 # 📄 Citation
 
-If you use this work, please cite [our paper](https://arxiv.org/abs/2103.16544) (to appear in CVPR'21):
+If you use this work, please cite [our CVPR'21 paper](https://arxiv.org/abs/2103.16544):
 
 ```bibtex
 @InProceedings{RoomR,

diff --git a/baseline_configs/rearrange_base.py b/baseline_configs/rearrange_base.py
@@ -3,6 +3,7 @@
 from abc import abstractmethod
 from typing import Optional, List, Sequence, Dict, Any
 
+import ai2thor.platform
 import gym.spaces
 import stringcase
 import torch
@@ -204,11 +205,14 @@ def stagewise_task_sampler_args(
         process_ind: int,
         total_processes: int,
         allowed_rearrange_inds_subset: Optional[Sequence[int]] = None,
+        allowed_scenes: Sequence[str] = None,
         devices: Optional[List[int]] = None,
         seeds: Optional[List[int]] = None,
         deterministic_cudnn: bool = False,
     ):
-        if stage == "combined":
+        if allowed_scenes is not None:
+            scenes = allowed_scenes
+        elif stage == "combined":
             # Split scenes more evenly as the train scenes will have more episodes
             train_scenes = datagen_utils.get_scenes("train")
             other_scenes = datagen_utils.get_scenes("val") + datagen_utils.get_scenes(
@@ -247,27 +251,42 @@ def stagewise_task_sampler_args(
             else torch.device("cpu")
         )
         x_display: Optional[str] = None
+        gpu_device: Optional[int] = None
+        thor_platform: Optional[ai2thor.platform.BaseLinuxPlatform] = None
         if platform.system() == "Linux":
-            x_displays = get_open_x_displays(throw_error_if_empty=True)
-
-            if devices is not None and len(
-                [d for d in devices if d != torch.device("cpu")]
-            ) > len(x_displays):
-                get_logger().warning(
-                    f"More GPU devices found than X-displays (devices: `{x_displays}`, x_displays: `{x_displays}`)."
-                    f" This is not necessarily a bad thing but may mean that you're not using GPU memory as"
-                    f" efficiently as possible. Consider following the instructions here:"
-                    f" https://allenact.org/installation/installation-framework/#installation-of-ithor-ithor-plugin"
-                    f" describing how to start an X-display on every GPU."
-                )
-            x_display = x_displays[process_ind % len(x_displays)]
+            try:
+                raise IOError
+                x_displays = get_open_x_displays(throw_error_if_empty=True)
+
+                if devices is not None and len(
+                    [d for d in devices if d != torch.device("cpu")]
+                ) > len(x_displays):
+                    get_logger().warning(
+                        f"More GPU devices found than X-displays (devices: `{x_displays}`, x_displays: `{x_displays}`)."
+                        f" This is not necessarily a bad thing but may mean that you're not using GPU memory as"
+                        f" efficiently as possible. Consider following the instructions here:"
+                        f" https://allenact.org/installation/installation-framework/#installation-of-ithor-ithor-plugin"
+                        f" describing how to start an X-display on every GPU."
+                    )
+                x_display = x_displays[process_ind % len(x_displays)]
+            except IOError:
+                # Could not find an open `x_display`, use CloudRendering instead.
+                assert all(
+                    [d != torch.device("cpu") and d >= 0 for d in devices]
+                ), "Cannot use CPU devices when there are no open x-displays as CloudRendering requires specifying a GPU."
+                gpu_device = device
+                thor_platform = ai2thor.platform.CloudRendering
 
         kwargs = {
             "stage": stage,
             "allowed_scenes": allowed_scenes,
             "scene_to_allowed_rearrange_inds": scene_to_allowed_rearrange_inds,
             "seed": seed,
             "x_display": x_display,
+            "thor_controller_kwargs": {
+                "gpu_device": gpu_device,
+                "platform": thor_platform,
+            },
         }
 
         sensors = kwargs.get("sensors", copy.deepcopy(cls.SENSORS))

diff --git a/data/combined.pkl.gz → data/2021/combined.pkl.gz b/data/combined.pkl.gz → data/2021/combined.pkl.gz
diff --git a/data/test.pkl.gz → data/2021/test.pkl.gz b/data/test.pkl.gz → data/2021/test.pkl.gz
diff --git a/data/train.pkl.gz → data/2021/train.pkl.gz b/data/train.pkl.gz → data/2021/train.pkl.gz
diff --git a/data/train_unseen.pkl.gz → data/2021/train_unseen.pkl.gz b/data/train_unseen.pkl.gz → data/2021/train_unseen.pkl.gz
diff --git a/data/val.pkl.gz → data/2021/val.pkl.gz b/data/val.pkl.gz → data/2021/val.pkl.gz
diff --git a/data/2022/combined.pkl.gz b/data/2022/combined.pkl.gz
diff --git a/data/2022/test.pkl.gz b/data/2022/test.pkl.gz
diff --git a/data/2022/train.pkl.gz b/data/2022/train.pkl.gz
diff --git a/data/2022/val.pkl.gz b/data/2022/val.pkl.gz
diff --git a/datagen/create_combined_dataset.py b/datagen/create_combined_dataset.py
@@ -1,15 +1,15 @@
 import json
 import os
-import pickle
 from collections import defaultdict
 
 import compress_pickle
+from allenact.utils.misc_utils import partition_sequence
 
 from rearrange.constants import STARTER_DATA_DIR
 
 
-def combine():
-    stages = ("train", "train_unseen", "val", "test")
+def combine(task_limit_for_train: int = 10000):
+    stages = ("train", "val", "test")
 
     all_data = defaultdict(lambda: [])
     for stage in stages:
@@ -20,17 +20,24 @@ def combine():
             raise RuntimeError(f"No data at path {data_path}")
 
         data = compress_pickle.load(path=data_path)
-        max_per_scene = 15 if "train" in stage else 10000
+        max_per_scene = task_limit_for_train if "train" in stage else 10000
         count = 0
         for scene in data:
-            for ind, task_spec_dict in enumerate(data[scene][:max_per_scene]):
-                count += 1
+            assert len(data[scene]) == 50
 
+            for index, task_spec_dict in enumerate(data[scene]):
                 task_spec_dict["scene"] = scene
-                task_spec_dict["index"] = ind
+                task_spec_dict["index"] = index
                 task_spec_dict["stage"] = stage
 
-                all_data[scene].append(task_spec_dict)
+            pieces_per_part = max_per_scene // 5 # 5 hardnesses
+            parts = partition_sequence(data[scene], 5)
+            all_together = sum(
+                [part[:pieces_per_part] for part in parts], []
+            )
+
+            count += len(all_together)
+            all_data[scene].extend(all_together)
 
         print(count)
     all_data = dict(all_data)
@@ -40,9 +47,9 @@ def combine():
     compress_pickle.dump(
         obj=all_data,
         path=os.path.join(STARTER_DATA_DIR, f"combined.pkl.gz"),
-        protocol=pickle.HIGHEST_PROTOCOL,
+        pickler_kwargs={"protocol": 4,},  # Backwards compatible with python 3.6
     )
 
 
 if __name__ == "__main__":
-    combine()
+    combine(10)
diff --git a/datagen/datagen_constants.py b/datagen/datagen_constants.py
@@ -3,6 +3,9 @@
     "Bread",
     "Cloth",
     "HandTowel",
+    "HandTowelHolder",
+    "Towel",
+    "TowelHolder",
     "KeyChain",
     "Lettuce",
     "Pillow",