ASSISTER: Assistive Navigation via Conditional Instruction Generation

This repository contains the dataset and code (available soon) that accompanies our ECCV 2022 paper ASSISTER: Assistive Navigation via Conditional Instruction Generation.

Simulation Dataset

We provide the [Training] and [Testing] dataset.

Overview

Our simulation dataset is collected using CARLA. While CARLA is typically used for the development of autonomous driving policies, we modify the environment to collect instructional guidance and a sidewalk pedestrian perspective in various kinds of weather and towns. Overall, our simulation dataset contains 399,126 samples for training and 103,869 samples for testing. Among the testing samples, 36,615 samples are new town new weather (ntnw) setting, 36,378 are new town same weather (ntsw) setting, and 30,876 samples are same town same weather (stsw) setting.

Data Format

We provide train.json which contains all the information needed for training. Note that the json file format is very similar to the coco format but not identical. The main difference is that we include the key: "goal" besides the "id" and "filename" in the list of images, and there exists multiple "id" map to one image "filename". This is because even with the same image, the instruction varies with different goals, and thus we cannot treat the images identically.

{    
  "info"         : [info],    
  "images"       : [image],    
  "annotations"  : [annotation],    
  "licenses"     : str,    
  "type"         : str,    
}     
    
info{    
  "year"         : int, 
  "version"      : str, 
  "description"  : str,    
}    
    
image{  
  "id"           : int,    
  "file_name"    : str,   
  "goal"         : [float, float],  
}    
    
annotation{  
  "id"           : int,  
  "image_id"     : int,  
  "caption"      : str,  
}

Demo

Environment Setup

To run the code, you will need to install py-bottom-up-attention and Oscar.

Installation

# Install py-bottom-up-attention
cd py-bottom-up-attention
pip install -r requirements.txt
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# Install detectron2
python setup.py build develop

# install Oscar
cd Oscar/coco_caption
./get_stanford_models.sh
cd ..
python setup.py build develop
pip install -r requirements.txt

Note: We have made some modifications to the original repository. Please use our repository instead of cloning from the original one to ensure compatibility with our custom changes and configurations.

Training and Inference

The training process includes two stages: extracting features from images and training an image captioning model.

Extract Features from Images

To extract features from images, please refer to the ASSISTER/py-bottom-up-attention/demo/extract_feature_sim.ipynb notebook. This notebook is designed to work with our simulation dataset.

Training an Conditional Instruction Generation Model

To train the image captioning model, you can use the following command:

python oscar/run_captioning.py --model_name_or_path pretrained_language_model_path --do_train --do_lower_case --evaluate_during_training --add_od_labels --learning_rate 0.00003 --per_gpu_train_batch_size 16 --num_train_epochs 500 --save_steps 2000 --output_dir your_output_directory --train_yaml train_config_file_path --data_dir dataset_path --val_yaml val_config_file_path --max_seq_length 90 --max_gen_length 40

You can download the pretrained language models from Oscar Pre-trained Models
Make sure to replace the following placeholders with your specific information:

your_model_path: Replace this with the path to your pre-trained model.
your_output_directory: Set the directory where you want to save the model output and training results.
train_config_file_path: Provide the path to your train configuration file.
dataset_path: Specify the path to your dataset directory.
val_config_file_path: Enter the path to your validation configuration file.
Adjust the other parameters (e.g., learning_rate, batch_size) as needed to suit your training requirements.

Inference

After successfully training and saving your model, you can perform inference and evaluate it on a testing dataset using the following command:

python oscar/run_captioning.py --do_test --test_yaml test_config_file_path --per_gpu_eval_batch_size 16 --num_beams 4 --max_gen_length 30 --eval_model_dir your_model_path

Contact

Please contact us if you have any remarks or questions at [email protected].

License

Our work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

ToDos

Simulation dataset release
Training code release
Real-world dataset release

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
Oscar		Oscar
baseline		baseline
detectron2		detectron2
images		images
py-bottom-up-attention		py-bottom-up-attention
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASSISTER: Assistive Navigation via Conditional Instruction Generation

Simulation Dataset

Overview

Data Format

Demo

Environment Setup

Installation

Training and Inference

Extract Features from Images

Training an Conditional Instruction Generation Model

Inference

Contact

License

ToDos

About

Releases

Packages

Contributors 2

Languages

License

h2xlab/ASSISTER

Folders and files

Latest commit

History

Repository files navigation

ASSISTER: Assistive Navigation via Conditional Instruction Generation

Simulation Dataset

Overview

Data Format

Demo

Environment Setup

Installation

Training and Inference

Extract Features from Images

Training an Conditional Instruction Generation Model

Inference

Contact

License

ToDos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages