Pytorch Implementation of the paper:
Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue.
Zipeng Xu, Fandong Meng, Xiaojie Wang, Duo Zheng, Chenxu Lv and Jie Zhou.
In Proccedings of BMVC 2021.
(The Appendix is included in our arXiv version:
This code is adapted from vmurahari3/visdial-diversity, we thank for their open sourcing.
Download preprocessed dialog data for VisDial v1.0:
sh scripts/
Download extracted features:
We use bottom-up image features with 10-100 proposals for each image. We use the features provided by Gi-Cheon Kang et al.. We thanks for their release.
Please download the files and put them under data/image_features
: Bottom-up features of 10-100 proposals from images oftrain
split (32GB).train_imgid2idx.pkl
to bounding box index file fortrain
: Bottom-up features of 10-100 proposals from images ofval
split (0.5GB).val_imgid2idx.pkl
to bounding box index file forval
For Supervised Learning pre-training:
SL: Q-Bot
python -useGPU -trainMode sl-qbot -saveName SL_QBot
SL: A-Bot
python -useGPU -trainMode sl-abot -a_learningRate 4e-4 -lrDecayRate 0.75 -saveName SL_ABot
For Reinforcement Learning fine-tuning with ECS-based rewards:
python -dropout 0 -useGPU -useNDCG -trainMode rl-full-QAf -startFrom checkpoints/SL_ABOT.vd -qstartFrom checkpoints/SL_QBOT.vd -saveName RL-ECS
Will be released this week.
author = {Xu, Zipeng and Meng, Fandong and Wang, Xiaojie and Zheng, Duo and Lv, Chenxu and Zhou, Jie},
title = {modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue},
booktitle = {Proceedings of the 32nd British Machine Vision Conference (BMVC)},
year = {2021}