allenai · jordis-ai2 · Jul 13, 2022 · Jul 17, 2022 · Jul 22, 2022 · Jul 22, 2022
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,6 @@
+data/2022procthor/mini_val_consolidated.pkl.gz filter=lfs diff=lfs merge=lfs -text
+data/2022procthor/split_mini_val filter=lfs diff=lfs merge=lfs -text
+data/2022procthor/split_mini_val/** filter=lfs diff=lfs merge=lfs -text
+data/2022procthor/split_train filter=lfs diff=lfs merge=lfs -text
+data/2022procthor/split_train/** filter=lfs diff=lfs merge=lfs -text
+data/2022procthor/train_consolidated.pkl.gz filter=lfs diff=lfs merge=lfs -text
diff --git a/.gitignore b/.gitignore
@@ -156,3 +156,6 @@ dmypy.json
 
 # Cython debug symbols
 cython_debug/
+
+# PyCharm settings
+.idea/
diff --git a/README.md b/README.md
@@ -78,6 +78,7 @@ with open("README.md", "r") as f:
 </li>
 <li><a href="#-training-baseline-models-with-allenact">🏋 Training Baseline Models with AllenAct</a><ul>
 <li><a href="#-pretrained-models">💪 Pretrained Models</a></li>
+<li><a href="#-procthor-pre-training">🏘 ProcTHOR pre-training</a></li>
 </ul>
 </li>
 </ul>
@@ -173,7 +174,7 @@ a local `./src` directory. By explicitly specifying the `PIP_SRC` variable we ca
 
 **AI2-THOR 4.2.0 🧞.** To ensure reproducible results, we're restricting all users to use the exact same version of <span class="chillMono">AI2-THOR</span>.
 
-**AllenAct 🏋💪.** We ues the <span class="chillMono">AllenAct</span> reinforcement learning framework 
+**AllenAct 🏋💪.** We use the <span class="chillMono">AllenAct</span> reinforcement learning framework 
     for generating baseline models, baseline training pipelines, and for several of their helpful abstractions/utilities.
 
 ## 📝 Rearrangement Task Description
@@ -532,18 +533,22 @@ A similar model can be trained for the 2-phase challenge by running
 allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py
 ```
 
+For ProcTHOR pre-training, please [check below](#-procthor-pre-training).
+
 ### 💪 Pretrained Models
 
 In the below table we provide a collection of pretrained models from:
 
-1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544), and
-2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance acros embodied tasks](https://arxiv.org/abs/2111.09888).
+1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544),
+2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance across embodied tasks](https://arxiv.org/abs/2111.09888), and
+3. [ProcTHOR pre-training with fine-tuning](https://arxiv.org/abs/2206.06994).
 
 We have only evaluated a subset of these models on our 2022 dataset.
 
 | Model | % Fixed Strict (2022 dataset, test) | % Fixed Strict (2021 dataset, test) | Pretrained Model |
 |------------|:-----------------------------------:|:-----------------------------------:|:----------:|
-| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) |              **19.1%**              |              **17.3%**              | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
+| [1-Phase Embodied CLIP ResNet50 IL (ProcTHOR pretraining)](baseline_configs/one_phase/procthor/ithor/ithor_one_phase_rgb_fine_tune.py) |              **24.5%**              |              -              | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_iThorOnePhaseRGBClipResNet50FineTune_procthor180Msteps_ithor_splits_ithor_fine_tune_64_to_128_rollout_3Msteps_6Msteps__stage_02__steps_000016018675.pt) |
+| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) |              19.1%              |              **17.3%**              | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
 | [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) |                  -                  |                8.9%                 | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
 | [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) |                  -                  |                6.3%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
 | [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) |                  -                  |                5.3%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
@@ -565,6 +570,68 @@ this will evaluate this model across all datapoints in the `data/combined.pkl.gz
 which contains data from the `train`, `val`, and `test` sets so that
 evaluation doesn't have to be run on each set separately.
 
+### 🏘 ProcTHOR pre-training
+
+We include commands that can be used to generate a ProcTHOR-pretrained agent and
+then fine-tune it with the 2022 rearrange dataset. Please note that this only covers the 1-phase modality,
+for which we also provide a
+[pre-trained and fine-tuned checkpoint](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_iThorOnePhaseRGBClipResNet50FineTune_procthor180Msteps_ithor_splits_ithor_fine_tune_64_to_128_rollout_3Msteps_6Msteps__stage_02__steps_000016018675.pt).
+We also provide scripts to generate new ProcTHOR datasets in case you want to try new episodes. 
+
+#### Pre-train model in ProcTHOR (single machine)
+The following will take for about 10-14 days on an 8-GPU machine with 56 CPU cores:
+```bash
+allenact -b baseline_configs/one_phase/procthor one_phase_rgb_clip_dagger \
+ -s 12345 --config_kwargs '{"distributed_nodes":1}' 
+```
+We **strongly** recommend using a larger number of GPUs and computing nodes for this step.
+
+#### ProcTHOR mini-valid
+Run ProcTHOR mini-valid on checkpoints under a `CKPT_DIR` directory:
+```bash
+inv make-valid-houses-file
+allenact -b baseline_configs/one_phase/procthor/eval eval_minivalid_procthor \
+ -s 12345 --eval --approx_ckpt_step_interval 5e6 -c CKPT_DIR
+```
+
+#### Fine-tune model in iTHOR (single machine)
+
+The following will take for about two days on an 8-GPU machine with 56 CPU cores.
+Assuming the chosen checkpoint from ProcTHOR pre-training has path `CKPT_PATH`:
+```bash
+allenact -b baseline_configs/one_phase/procthor/ithor ithor_one_phase_rgb_fine_tune \
+ -s 12345 -c CKPT_PATH --restart_pipeline
+```
+
+#### iTHOR mini validation
+Run iTHOR mini-valid on checkpoints under `CKPT_DIR`:
+```bash
+inv make-ithor-mini-val
+allenact -b baseline_configs/one_phase/procthor/eval eval_minivalid_ithor \
+ -s 12345 --eval --approx_ckpt_step_interval 5e6 -c CKPT_DIR
+```
+
+#### Generate new ProcTHOR training and mini-valid episodes
+Our training and validation datasets are already provided under [data/2022procthor](data/2022procthor), but we also provide our used
+scripts in case any user is interested in trying new episode distributions.
+
+For training, we use a dataset composed of 50,000 episodes sampled from 2,500 houses with one or two rooms.
+The following commands take for several hours on an 8-GPU machine with 56 CPU cores:
+```bash
+python datagen/procthor_datagen/datagen_runner_train.py
+inv make-procthor-mini-train
+```
+
+To create a ProcTHOR mini-valid dataset, the following commands take for several hours on an 8-GPU machine
+with 56 CPU cores: 
+```bash
+python datagen/procthor_datagen/datagen_runner_valid.py
+inv consolidate-procthor-val
+inv make-procthor-mini-val
+inv make-valid-houses-file
+```
+
+
 # 📄 Citation
 
 If you use this work, please cite [our CVPR'21 paper](https://arxiv.org/abs/2103.16544):

diff --git a/baseline_configs/one_phase/procthor/__init__.py b/baseline_configs/one_phase/procthor/__init__.py
diff --git a/baseline_configs/one_phase/procthor/eval/__init__.py b/baseline_configs/one_phase/procthor/eval/__init__.py
diff --git a/baseline_configs/one_phase/procthor/eval/eval_minivalid_ithor.py b/baseline_configs/one_phase/procthor/eval/eval_minivalid_ithor.py
@@ -0,0 +1,222 @@
+from baseline_configs.one_phase.procthor.ithor.ithor_one_phase_rgb_fine_tune import (
+    OnePhaseRGBClipResNet50FineTuneExperimentConfig as BaseConfig,
+)
+
+import copy
+import platform
+from typing import Optional, List, Sequence
+
+import ai2thor.platform
+import torch
+
+from allenact.base_abstractions.sensor import ExpertActionSensor
+from allenact.utils.misc_utils import partition_sequence, md5_hash_str_as_int
+from allenact.utils.system import get_logger
+from allenact_plugins.ithor_plugin.ithor_sensors import (
+    BinnedPointCloudMapTHORSensor,
+    SemanticMapTHORSensor,
+)
+from allenact_plugins.ithor_plugin.ithor_util import get_open_x_displays
+
+
+def get_scenes(stage: str) -> List[str]:
+    """Returns a list of iTHOR scene names for each stage."""
+    assert stage in {
+        "train",
+        "train_unseen",
+        "val",
+        "valid",
+        "test",
+        "all",
+        "ithor_mini_val",
+        "debug",
+    }
+
+    if stage == "debug":
+        return ["FloorPlan1"]
+
+    # [1-20] for train, [21-25] for val, [26-30] for test
+    if stage in ["train", "train_unseen"]:
+        scene_nums = range(1, 21)
+    elif stage in ["val", "valid", "ithor_mini_val"]:
+        scene_nums = range(21, 26)
+    elif stage == "test":
+        scene_nums = range(26, 31)
+    elif stage == "all":
+        scene_nums = range(1, 31)
+    else:
+        raise NotImplementedError
+
+    kitchens = [f"FloorPlan{i}" for i in scene_nums]
+    living_rooms = [f"FloorPlan{200+i}" for i in scene_nums]
+    bedrooms = [f"FloorPlan{300+i}" for i in scene_nums]
+    bathrooms = [f"FloorPlan{400+i}" for i in scene_nums]
+    return kitchens + living_rooms + bedrooms + bathrooms
+
+
+class EvalConfig(BaseConfig):
+    def stagewise_task_sampler_args(
+        self,
+        stage: str,
+        process_ind: int,
+        total_processes: int,
+        allowed_rearrange_inds_subset: Optional[Sequence[int]] = None,
+        allowed_scenes: Sequence[str] = None,
+        devices: Optional[List[int]] = None,
+        seeds: Optional[List[int]] = None,
+        deterministic_cudnn: bool = False,
+    ):
+        if allowed_scenes is not None:
+            scenes = allowed_scenes
+        elif stage == "combined":
+            # Split scenes more evenly as the train scenes will have more episodes
+            train_scenes = get_scenes("train")
+            other_scenes = get_scenes("val") + get_scenes("test")
+            assert len(train_scenes) == 2 * len(other_scenes)
+            scenes = []
+            while len(train_scenes) != 0:
+                scenes.append(train_scenes.pop())
+                scenes.append(train_scenes.pop())
+                scenes.append(other_scenes.pop())
+            assert len(train_scenes) == len(other_scenes)
+        else:
+            scenes = get_scenes(stage)
+
+        if total_processes > len(scenes):
+            assert stage == "train" and total_processes % len(scenes) == 0
+            scenes = scenes * (total_processes // len(scenes))
+
+        allowed_scenes = list(
+            sorted(partition_sequence(seq=scenes, parts=total_processes,)[process_ind])
+        )
+
+        scene_to_allowed_rearrange_inds = None
+        if allowed_rearrange_inds_subset is not None:
+            allowed_rearrange_inds_subset = tuple(allowed_rearrange_inds_subset)
+            assert stage in ["valid", "train_unseen"]
+            scene_to_allowed_rearrange_inds = {
+                scene: allowed_rearrange_inds_subset for scene in allowed_scenes
+            }
+        seed = md5_hash_str_as_int(str(allowed_scenes))
+
+        device = (
+            devices[process_ind % len(devices)]
+            if devices is not None and len(devices) > 0
+            else torch.device("cpu")
+        )
+        x_display: Optional[str] = None
+        gpu_device: Optional[int] = None
+        thor_platform: Optional[ai2thor.platform.BaseLinuxPlatform] = None
+        if platform.system() == "Linux":
+            try:
+                x_displays = get_open_x_displays(throw_error_if_empty=True)
+
+                if devices is not None and len(
+                    [d for d in devices if d != torch.device("cpu")]
+                ) > len(x_displays):
+                    get_logger().warning(
+                        f"More GPU devices found than X-displays (devices: `{x_displays}`, x_displays: `{x_displays}`)."
+                        f" This is not necessarily a bad thing but may mean that you're not using GPU memory as"
+                        f" efficiently as possible. Consider following the instructions here:"
+                        f" https://allenact.org/installation/installation-framework/#installation-of-ithor-ithor-plugin"
+                        f" describing how to start an X-display on every GPU."
+                    )
+                x_display = x_displays[process_ind % len(x_displays)]
+            except IOError:
+                # Could not find an open `x_display`, use CloudRendering instead.
+                assert all(
+                    [d != torch.device("cpu") and d >= 0 for d in devices]
+                ), "Cannot use CPU devices when there are no open x-displays as CloudRendering requires specifying a GPU."
+                gpu_device = device
+                thor_platform = ai2thor.platform.CloudRendering
+
+        kwargs = {
+            "stage": stage,
+            "allowed_scenes": allowed_scenes,
+            "scene_to_allowed_rearrange_inds": scene_to_allowed_rearrange_inds,
+            "seed": seed,
+            "x_display": x_display,
+            "thor_controller_kwargs": {
+                "gpu_device": gpu_device,
+                "platform": thor_platform,
+            },
+        }
+
+        sensors = kwargs.get("sensors", copy.deepcopy(self.sensors()))
+        kwargs["sensors"] = sensors
+
+        sem_sensor = next(
+            (s for s in kwargs["sensors"] if isinstance(s, SemanticMapTHORSensor)), None
+        )
+        binned_pc_sensor = next(
+            (
+                s
+                for s in kwargs["sensors"]
+                if isinstance(s, BinnedPointCloudMapTHORSensor)
+            ),
+            None,
+        )
+
+        if sem_sensor is not None:
+            sem_sensor.device = torch.device(device)
+
+        if binned_pc_sensor is not None:
+            binned_pc_sensor.device = torch.device(device)
+
+        if stage != "train":
+            # Don't include several sensors during validation/testing
+            kwargs["sensors"] = [
+                s
+                for s in kwargs["sensors"]
+                if not isinstance(
+                    s,
+                    (
+                        ExpertActionSensor,
+                        SemanticMapTHORSensor,
+                        BinnedPointCloudMapTHORSensor,
+                    ),
+                )
+            ]
+        return kwargs
+
+    def test_task_sampler_args(
+        self,
+        process_ind: int,
+        total_processes: int,
+        devices=None,
+        seeds=None,
+        deterministic_cudnn: bool = False,
+        task_spec_in_metrics: bool = False,
+    ):
+        task_spec_in_metrics = False
+
+        # Train_unseen
+        # stage = "train_unseen"
+        # allowed_rearrange_inds_subset = list(range(15))
+
+        # Val
+        stage = "ithor_mini_val"
+        allowed_rearrange_inds_subset = None
+
+        # Test
+        # stage = "test"
+        # allowed_rearrange_inds_subset = None
+
+        # Combined (Will run inference on all datasets)
+        # stage = "combined"
+        # allowed_rearrange_inds_subset = None
+
+        return dict(
+            force_cache_reset=True,
+            epochs=1,
+            task_spec_in_metrics=task_spec_in_metrics,
+            **self.stagewise_task_sampler_args(
+                stage=stage,
+                allowed_rearrange_inds_subset=allowed_rearrange_inds_subset,
+                process_ind=process_ind,
+                total_processes=total_processes,
+                devices=devices,
+                seeds=seeds,
+                deterministic_cudnn=deterministic_cudnn,
+            ),
+        )