arXiv | code & data | website | baselines
We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We annotate all statements with executable Python programs representing their meaning to enable exact reward computation in every possible world state.
Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty.
We experiment with lilGym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, lilGym forms a challenging open problem.
TowerScratch (left), TowerFlipIt (right)
ScatterScratch (left), ScatterFlipIt (right)
The data and details can be found in: lilgym/data/
.
A description can be found in lilGym: Natural Language Visual Reasoning with Reinforcement Learning. The data is based on the Cornell Natural Language Visual Reasoning (NLVR) Corpus v1.0 (Suhr et al. 2017) corpus.
Notes:
- The codebase has been tested with Python 3.7/3.8, with PyTorch 1.12.1+cu102, CUDA 11.2
- On-going work for compatibility with higher versions
- Create a conda environment
conda create -n lilgym python=3.7
conda activate lilgym
Install PyTorch:
pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 --extra-index-url https://download.pytorch.org/whl/cu102
Note:
- For using conda with with 3.7 on Apple Silicone, you may check: link
-
Clone the repo:
git clone https://github.com/lil-lab/lilgym.git
-
Install the dependencies
cd lilgym
pip install -r requirements.txt
Note: the environment is updated to be used with Gymnasium (formerly Gym).
To install the package from source:
cd lilgym
pip install .
The environments follow standard Gym API.
Following is a short demo script:
import gymnasium as gym
env = gym.make("TowerScratch-v0", split="train", stop_forcing=False, disable_env_checker=True)
env.seed(1)
observation, info = env.reset()
for _ in range(100):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
Note: disable_env_checker
comes with Gymnasium (new Gym), and can be set to False
if needed.
Configurations
There are four configurations: TowerScratch
, TowerFlipIt
, ScatterScratch
and ScatterFlipIt
. Examples:
env = gym.make("TowerFlipIt-v0", split="train", stop_forcing=False)
env = gym.make("ScatterScratch-v0", split="dev", stop_forcing=False)
env = gym.make("ScatterFlipIt-v0", split="test", stop_forcing=False)
Data splits
There are three data splits for each configuration: train
, dev
, and test
.
Stop forcing
stop_forcing
specifies whether to use the algorithm with stop forcing at training time. Inference is always done without stop forcing.
Data reading
There are two ways to load data:
-
Using the argument
split
as above -
Using the argument
data
. An example:
import gym
from lilgym.data.utils import get_data
data = get_data('tower', 'scratch', 'train')
env = gym.make("TowerScratch-v0", data=data, stop_forcing=True)
More details about the environment can be found in: lilgym/envs/README.md
.
The baselines with the training and inference code will also be soon released.
MIT
@inproceedings{wu-etal-2023-lilgym,
title = "lil{G}ym: Natural Language Visual Reasoning with Reinforcement Learning",
author = "Wu, Anne and
Brantley, Kiante and
Kojima, Noriyuki and
Artzi, Yoav",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.512",
pages = "9214--9234",
}
This research was supported by ARO W911NF21-1-0106, NSF under grant No. 1750499, a gift from Open Philanthropy, and NSF under grant No. 2127309 to the Computing Research Association for the CIFellows Project. Results presented in this paper were obtained using CloudBank, which is supported by the National Science Foundation under award No. 1925001. We thank Alane Suhr, Ge Gao, Justin Chiu, Woojeong Kim, Jack Morris, Jacob Sharf and the Cornell NLP Group for support, comments and helpful discussions.
Anne Wu ([email protected])