Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged procthor code #39

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
data/2022procthor/mini_val_consolidated.pkl.gz filter=lfs diff=lfs merge=lfs -text
data/2022procthor/split_mini_val filter=lfs diff=lfs merge=lfs -text
data/2022procthor/split_mini_val/** filter=lfs diff=lfs merge=lfs -text
data/2022procthor/split_train filter=lfs diff=lfs merge=lfs -text
data/2022procthor/split_train/** filter=lfs diff=lfs merge=lfs -text
data/2022procthor/train_consolidated.pkl.gz filter=lfs diff=lfs merge=lfs -text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we have a dependency on git-lfs? Was this the issue we were talking about yesterday regarding how the prior library handles things?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I was actually not planning on using prior to distribute the datasets, but rather following the current design of directly hosting the data in the repository (with the modification of using git-lfs to keep future changes in the data from piling up in the history). If I'm right, we just need to clone the repo and all the procthor data is available, as with the iTHOR data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'd prefer we didn't introduce git-lfs as a dependency here as it's yet another thing to download and install (and getting this repository working in a new environment is already quite a lot for people). In the prior package I do some things behind the scenes so that we download a git-lfs binary onto the users machine in the background if someone doesn't have it so this is why the git lfs dependency isn't as much of an issue there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess I wasn't realizing I was able to directly clone the repository precisely because I had already installed git-lfs when I pushed the datasets. I'm not super confident about how to properly use prior to distribute the data, but I guess the example for procthor-10k will do.

Copy link
Author

@jordis-ai2 jordis-ai2 Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep things more consistent, wouldn't it make more sense to just keep everything in this repo, i.e. without git-lfs given its downsides? We will still have to explain how to install the data in a reachable path (e.g. via additional instructions in README) if we use prior.

Let me know if you're happy with the solution, and I'll add the dataset files to the repository. If prior is actually the preferred choice, then I would create a repo with all the datasets (also the 2021 and 2022 ithor ones) and install all the data via an invoke command calling prior.load_dataset, if that sounds reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and prepared an installer for the ProcTHOR dataset. If the design seems fine, we could port the regular iTHOR ones in a similar way.

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,6 @@ dmypy.json

# Cython debug symbols
cython_debug/

# PyCharm settings
.idea/
75 changes: 71 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ with open("README.md", "r") as f:
</li>
<li><a href="#-training-baseline-models-with-allenact">🏋 Training Baseline Models with AllenAct</a><ul>
<li><a href="#-pretrained-models">💪 Pretrained Models</a></li>
<li><a href="#-procthor-pre-training">🏘 ProcTHOR pre-training</a></li>
</ul>
</li>
</ul>
Expand Down Expand Up @@ -173,7 +174,7 @@ a local `./src` directory. By explicitly specifying the `PIP_SRC` variable we ca

**AI2-THOR 4.2.0 🧞.** To ensure reproducible results, we're restricting all users to use the exact same version of <span class="chillMono">AI2-THOR</span>.

**AllenAct 🏋💪.** We ues the <span class="chillMono">AllenAct</span> reinforcement learning framework
**AllenAct 🏋💪.** We use the <span class="chillMono">AllenAct</span> reinforcement learning framework
for generating baseline models, baseline training pipelines, and for several of their helpful abstractions/utilities.

## 📝 Rearrangement Task Description
Expand Down Expand Up @@ -532,18 +533,22 @@ A similar model can be trained for the 2-phase challenge by running
allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py
```

For ProcTHOR pre-training, please [check below](#-procthor-pre-training).

### 💪 Pretrained Models

In the below table we provide a collection of pretrained models from:

1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544), and
2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance acros embodied tasks](https://arxiv.org/abs/2111.09888).
1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544),
2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance across embodied tasks](https://arxiv.org/abs/2111.09888), and
3. [ProcTHOR pre-training with fine-tuning](https://arxiv.org/abs/2206.06994).

We have only evaluated a subset of these models on our 2022 dataset.

| Model | % Fixed Strict (2022 dataset, test) | % Fixed Strict (2021 dataset, test) | Pretrained Model |
|------------|:-----------------------------------:|:-----------------------------------:|:----------:|
| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) | **19.1%** | **17.3%** | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
| [1-Phase Embodied CLIP ResNet50 IL (ProcTHOR pretraining)](baseline_configs/one_phase/procthor/ithor/ithor_one_phase_rgb_fine_tune.py) | **24.5%** | - | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_iThorOnePhaseRGBClipResNet50FineTune_procthor180Msteps_ithor_splits_ithor_fine_tune_64_to_128_rollout_3Msteps_6Msteps__stage_02__steps_000016018675.pt) |
| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) | 19.1% | **17.3%** | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) | - | 8.9% | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) | - | 6.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) | - | 5.3% | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
Expand All @@ -565,6 +570,68 @@ this will evaluate this model across all datapoints in the `data/combined.pkl.gz
which contains data from the `train`, `val`, and `test` sets so that
evaluation doesn't have to be run on each set separately.

### 🏘 ProcTHOR pre-training

We include commands that can be used to generate a ProcTHOR-pretrained agent and
then fine-tune it with the 2022 rearrange dataset. Please note that this only covers the 1-phase modality,
for which we also provide a
[pre-trained and fine-tuned checkpoint](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_iThorOnePhaseRGBClipResNet50FineTune_procthor180Msteps_ithor_splits_ithor_fine_tune_64_to_128_rollout_3Msteps_6Msteps__stage_02__steps_000016018675.pt).
We also provide scripts to generate new ProcTHOR datasets in case you want to try new episodes.

#### Pre-train model in ProcTHOR (single machine)
The following will take for about 10-14 days on an 8-GPU machine with 56 CPU cores:
```bash
allenact -b baseline_configs/one_phase/procthor one_phase_rgb_clip_dagger \
-s 12345 --config_kwargs '{"distributed_nodes":1}'
```
We **strongly** recommend using a larger number of GPUs and computing nodes for this step.

#### ProcTHOR mini-valid
Run ProcTHOR mini-valid on checkpoints under a `CKPT_DIR` directory:
```bash
inv make-valid-houses-file
allenact -b baseline_configs/one_phase/procthor/eval eval_minivalid_procthor \
-s 12345 --eval --approx_ckpt_step_interval 5e6 -c CKPT_DIR
```

#### Fine-tune model in iTHOR (single machine)

The following will take for about two days on an 8-GPU machine with 56 CPU cores.
Assuming the chosen checkpoint from ProcTHOR pre-training has path `CKPT_PATH`:
```bash
allenact -b baseline_configs/one_phase/procthor/ithor ithor_one_phase_rgb_fine_tune \
-s 12345 -c CKPT_PATH --restart_pipeline
```

#### iTHOR mini validation
Run iTHOR mini-valid on checkpoints under `CKPT_DIR`:
```bash
inv make-ithor-mini-val
allenact -b baseline_configs/one_phase/procthor/eval eval_minivalid_ithor \
-s 12345 --eval --approx_ckpt_step_interval 5e6 -c CKPT_DIR
```

#### Generate new ProcTHOR training and mini-valid episodes
Our training and validation datasets are already provided under [data/2022procthor](data/2022procthor), but we also provide our used
scripts in case any user is interested in trying new episode distributions.

For training, we use a dataset composed of 50,000 episodes sampled from 2,500 houses with one or two rooms.
The following commands take for several hours on an 8-GPU machine with 56 CPU cores:
```bash
python datagen/procthor_datagen/datagen_runner_train.py
inv make-procthor-mini-train
```

To create a ProcTHOR mini-valid dataset, the following commands take for several hours on an 8-GPU machine
with 56 CPU cores:
```bash
python datagen/procthor_datagen/datagen_runner_valid.py
inv consolidate-procthor-val
inv make-procthor-mini-val
inv make-valid-houses-file
```


jordis-ai2 marked this conversation as resolved.
Show resolved Hide resolved
# 📄 Citation

If you use this work, please cite [our CVPR'21 paper](https://arxiv.org/abs/2103.16544):
Expand Down
Empty file.
Empty file.
222 changes: 222 additions & 0 deletions baseline_configs/one_phase/procthor/eval/eval_minivalid_ithor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
from baseline_configs.one_phase.procthor.ithor.ithor_one_phase_rgb_fine_tune import (
OnePhaseRGBClipResNet50FineTuneExperimentConfig as BaseConfig,
)

import copy
import platform
from typing import Optional, List, Sequence

import ai2thor.platform
import torch

from allenact.base_abstractions.sensor import ExpertActionSensor
from allenact.utils.misc_utils import partition_sequence, md5_hash_str_as_int
from allenact.utils.system import get_logger
from allenact_plugins.ithor_plugin.ithor_sensors import (
BinnedPointCloudMapTHORSensor,
SemanticMapTHORSensor,
)
from allenact_plugins.ithor_plugin.ithor_util import get_open_x_displays


def get_scenes(stage: str) -> List[str]:
"""Returns a list of iTHOR scene names for each stage."""
assert stage in {
"train",
"train_unseen",
"val",
"valid",
"test",
"all",
"ithor_mini_val",
"debug",
}

if stage == "debug":
return ["FloorPlan1"]

# [1-20] for train, [21-25] for val, [26-30] for test
if stage in ["train", "train_unseen"]:
scene_nums = range(1, 21)
elif stage in ["val", "valid", "ithor_mini_val"]:
scene_nums = range(21, 26)
elif stage == "test":
scene_nums = range(26, 31)
elif stage == "all":
scene_nums = range(1, 31)
else:
raise NotImplementedError

kitchens = [f"FloorPlan{i}" for i in scene_nums]
living_rooms = [f"FloorPlan{200+i}" for i in scene_nums]
bedrooms = [f"FloorPlan{300+i}" for i in scene_nums]
bathrooms = [f"FloorPlan{400+i}" for i in scene_nums]
return kitchens + living_rooms + bedrooms + bathrooms


class EvalConfig(BaseConfig):
def stagewise_task_sampler_args(
self,
stage: str,
process_ind: int,
total_processes: int,
allowed_rearrange_inds_subset: Optional[Sequence[int]] = None,
allowed_scenes: Sequence[str] = None,
devices: Optional[List[int]] = None,
seeds: Optional[List[int]] = None,
deterministic_cudnn: bool = False,
):
if allowed_scenes is not None:
scenes = allowed_scenes
elif stage == "combined":
# Split scenes more evenly as the train scenes will have more episodes
train_scenes = get_scenes("train")
other_scenes = get_scenes("val") + get_scenes("test")
assert len(train_scenes) == 2 * len(other_scenes)
scenes = []
while len(train_scenes) != 0:
scenes.append(train_scenes.pop())
scenes.append(train_scenes.pop())
scenes.append(other_scenes.pop())
assert len(train_scenes) == len(other_scenes)
else:
scenes = get_scenes(stage)

if total_processes > len(scenes):
assert stage == "train" and total_processes % len(scenes) == 0
scenes = scenes * (total_processes // len(scenes))

allowed_scenes = list(
sorted(partition_sequence(seq=scenes, parts=total_processes,)[process_ind])
)

scene_to_allowed_rearrange_inds = None
if allowed_rearrange_inds_subset is not None:
allowed_rearrange_inds_subset = tuple(allowed_rearrange_inds_subset)
assert stage in ["valid", "train_unseen"]
scene_to_allowed_rearrange_inds = {
scene: allowed_rearrange_inds_subset for scene in allowed_scenes
}
seed = md5_hash_str_as_int(str(allowed_scenes))

device = (
devices[process_ind % len(devices)]
if devices is not None and len(devices) > 0
else torch.device("cpu")
)
x_display: Optional[str] = None
gpu_device: Optional[int] = None
thor_platform: Optional[ai2thor.platform.BaseLinuxPlatform] = None
if platform.system() == "Linux":
try:
x_displays = get_open_x_displays(throw_error_if_empty=True)

if devices is not None and len(
[d for d in devices if d != torch.device("cpu")]
) > len(x_displays):
get_logger().warning(
f"More GPU devices found than X-displays (devices: `{x_displays}`, x_displays: `{x_displays}`)."
f" This is not necessarily a bad thing but may mean that you're not using GPU memory as"
f" efficiently as possible. Consider following the instructions here:"
f" https://allenact.org/installation/installation-framework/#installation-of-ithor-ithor-plugin"
f" describing how to start an X-display on every GPU."
)
x_display = x_displays[process_ind % len(x_displays)]
except IOError:
# Could not find an open `x_display`, use CloudRendering instead.
assert all(
[d != torch.device("cpu") and d >= 0 for d in devices]
), "Cannot use CPU devices when there are no open x-displays as CloudRendering requires specifying a GPU."
gpu_device = device
thor_platform = ai2thor.platform.CloudRendering

kwargs = {
"stage": stage,
"allowed_scenes": allowed_scenes,
"scene_to_allowed_rearrange_inds": scene_to_allowed_rearrange_inds,
"seed": seed,
"x_display": x_display,
"thor_controller_kwargs": {
"gpu_device": gpu_device,
"platform": thor_platform,
},
}

sensors = kwargs.get("sensors", copy.deepcopy(self.sensors()))
kwargs["sensors"] = sensors

sem_sensor = next(
(s for s in kwargs["sensors"] if isinstance(s, SemanticMapTHORSensor)), None
)
binned_pc_sensor = next(
(
s
for s in kwargs["sensors"]
if isinstance(s, BinnedPointCloudMapTHORSensor)
),
None,
)

if sem_sensor is not None:
sem_sensor.device = torch.device(device)

if binned_pc_sensor is not None:
binned_pc_sensor.device = torch.device(device)

if stage != "train":
# Don't include several sensors during validation/testing
kwargs["sensors"] = [
s
for s in kwargs["sensors"]
if not isinstance(
s,
(
ExpertActionSensor,
SemanticMapTHORSensor,
BinnedPointCloudMapTHORSensor,
),
)
]
return kwargs

def test_task_sampler_args(
self,
process_ind: int,
total_processes: int,
devices=None,
seeds=None,
deterministic_cudnn: bool = False,
task_spec_in_metrics: bool = False,
):
task_spec_in_metrics = False

# Train_unseen
# stage = "train_unseen"
# allowed_rearrange_inds_subset = list(range(15))

# Val
stage = "ithor_mini_val"
allowed_rearrange_inds_subset = None

# Test
# stage = "test"
# allowed_rearrange_inds_subset = None

# Combined (Will run inference on all datasets)
# stage = "combined"
# allowed_rearrange_inds_subset = None

return dict(
force_cache_reset=True,
epochs=1,
task_spec_in_metrics=task_spec_in_metrics,
**self.stagewise_task_sampler_args(
stage=stage,
allowed_rearrange_inds_subset=allowed_rearrange_inds_subset,
process_ind=process_ind,
total_processes=total_processes,
devices=devices,
seeds=seeds,
deterministic_cudnn=deterministic_cudnn,
),
)
Loading