Assuming you have pyenv and Poetry, clone the repository and run:
# Use Python 3.9.13 in the project
pyenv local 3.9.13
# Tell Poetry to use pyenv
poetry env use $(pyenv which python)
# Install dependencies
poetry install
# Activate the virtual environment
poetry shell
# Install pre-commit hooks
pre-commit install
Check out the CONTRIBUTING.md for more detailed information on getting started.
We've separated specific groups of dependencies so that you only need to install what you need.
- For demonstrating using Gradio, run
poetry install --with demo
This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.
scripts
—sh
scripts to run experimentsconfigs
— configurations files using the Hydra frameworkdocker
— Dockerfiles to ease deploymentnotebooks
— Jupyter notebook for analysis and explorationstorage
— data for training/inference (and maybe use symlinks to point to other parts of the filesystem)tests
— pytest scripts to verify the codesrc
— where the main code lives
All checkpoints are available here on HugginFace
These checkpoints include:
Model name | Description |
---|---|
emma_base_pretrain.ckpt | The EMMA base pretrained checkpoint |
unified_emma_base_finetune_arena.ckpt | The EMMA-unified variant fine tuned on the DTC task |
modular_action_emma_base_finetune_arena.ckpt | The EMMA-modular variant fine tuned on the DTC task that performs action execution and visual grounding |
vinvl_finetune_arena.ckpt | The finetuned VinVL checkpoint |
The DBs are required for pre-training and fine tuning and are available on Hugginface
We are providing DBs:
- Pretraining on image-based tasks (one-db per task)
- Finetuning on image-based tasks (one-db per task)
- Finetuning on the DTC tasks (one-db for action execution / visual grounding & one db for the contextual routing task)
Make sure that these are placed under storage/db
folder or alternatively set the path to the dbs within each experiment config.
The image features for all image-base tasks and the DTC benchmark on Huggingface
The image features were extracted using the pretrained VinVL checkpoint. For the DTC benchmark we have finetuned the checkpoint on the Alexa Arena data.
First, make sure that you have downloaded the pretraining db and the corresponding features.
python run.py experiment=pretrain.yaml
python run.py experiment=coco_downstream.yaml
python run.py experiment=vqa_v2_downstream.yaml
python run.py experiment=refcoco_downstream.yaml
python run.py experiment=nlvr2_downstream.yaml
When initializing from the pretrained model, which doesn't include the special tokens for the downstream CR and action prediction tasks, you will need to manually edit the vocabulary size in the model config. For initialization from the pretrained emma-base
, set the vocab_size
to 10252.
python run.py experiment=simbot_combined.yaml