FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

¹University of California, Santa Cruz, USA

Accepted to ECCV 2022

Paper

Abstract

Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacypreserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel Federated Vision-and-Language Navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized federated training strategy to limit the data of each client to its local model training and a federated preexploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that, decentralized federated training achieves comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy.

Architecture

We release the reproducible code here.

Environment Installation

Python requirements: Need python3.6

pip install -r python_requirements.txt

Please refer to this link to install Matterport3D simulator:

Pre-Computed Features

ImageNet ResNet152

Download image features for environments for Envdrop model:

mkdir img_features
wget https://www.dropbox.com/s/o57kxh2mn5rkx4o/ResNet-152-imagenet.zip -P img_features/
cd img_features
unzip ResNet-152-imagenet.zip

CLIP Features

Please download the CLIP-ViT features for CLIP-ViL models with this link:

wget https://nlp.cs.unc.edu/data/vln_clip/features/CLIP-ViT-B-32-views.tsv -P img_features

Training RxR

Data

Please download the pre-processed data with link:

wget https://nlp.cs.unc.edu/data/vln_clip/RxR.zip -P tasks
unzip tasks/RxR.zip -d tasks/

Training the Fed CLIP-ViL agent

For training Fed CLIP-ViL agent on RxR dataset, please run

    name=agent_rxr_en_clip_vit_fedavg_new_glr2
    flag="--attn soft --train listener
      --featdropout 0.3
      --angleFeatSize 128
      --language en
      --maxInput 160
      --features img_features/CLIP-ViT-B-32-views.tsv
      --feature_size 512
      --feedback sample
      --mlWeight 0.4
      --subout max --dropout 0.5 --optim rms --lr 1e-4 --iters 400000 --maxAction 35
      --if_fed True
      --fed_alg fedavg
      --global_lr 2
      --comm_round 910
      --local_epoches 5
      --n_parties 60
      "

mkdir -p snap/$name
CUDA_VISIBLE_DEVICES=2 python3 rxr_src/train.py $flag --name $name

Or you could simply run the script with the same content as above(we will use this in the following):

    bash run/agent_rxr_clip_vit_en_fedavg.bash

Training Fed Envdrop agent

    bash agent_rxr_resnet152_fedavg.bash

Training R2R

Download the Data

Download Room-to-Room navigation data:

bash ./tasks/R2R/data/download.sh

Train the Fed CLIP-ViL Agent

Run the script:

bash run/agent_clip_vit_fedavg.bash

It will train the agent and save the snapshot under snap/agent/. Notice that we tried global learning rate schedular, which may help the training. Unseen success rate would be around 53%.

Augmented training

Train the speaker
```
bash run/speaker_clip_vit_fedavg.bash
```
It will train the speaker and save the snapshot under snap/speaker/
Augmented training:

After pre-training the speaker and the agnet,
```
bash run/bt_envdrop_clip_vit_fedavg.bash
```
It will load the pre-trained agent and train on augmented data with environmental dropout.

Training the Fed Envdrop agent

Agent
```
bash run/agent_fedavg.bash
```

Fed Speaker + Aug training

bash run/speaker_fedavg.bash
bash run/bt_envdrop_fedavg.bash

Fed CLIP-ViL pre-exploration

After train the CLIP-ViL speaker, run

  bash run/pre_explore_clip_vit_fedavg.bash

Fed Envdrop pre-exploration

After train the resnet speaker, run

  bash run/pre_explore_fedavg.bash

Reference

If you use FedVLN in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.

@article{zhou2022fedvln,
  title={FedVLN: Privacy-preserving Federated Vision-and-Language Navigation},
  author = {Zhou, Kaiwen and Wang, Xin Eric},
  journal={arXiv preprint arXiv:2203.14936},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
cmake		cmake
connectivity		connectivity
include		include
r2r_src		r2r_src
run		run
rxr_src		rxr_src
semantic_views		semantic_views
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
changes.zip		changes.zip
modify.py		modify.py
precomute_imagenet_views.py		precomute_imagenet_views.py
python_requirements.txt		python_requirements.txt
unzip.sh		unzip.sh
visulization.py		visulization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

Accepted to ECCV 2022

Paper

Abstract

Architecture

Environment Installation

Pre-Computed Features

ImageNet ResNet152

CLIP Features

Training RxR

Data

Training the Fed CLIP-ViL agent

Training Fed Envdrop agent

Training R2R

Download the Data

Train the Fed CLIP-ViL Agent

Augmented training

Training the Fed Envdrop agent

Fed CLIP-ViL pre-exploration

Fed Envdrop pre-exploration

Related Links

Reference

About

Releases

Packages

Languages

License

eric-ai-lab/FedVLN

Folders and files

Latest commit

History

Repository files navigation

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

Accepted to ECCV 2022

Paper

Abstract

Architecture

Environment Installation

Pre-Computed Features

ImageNet ResNet152

CLIP Features

Training RxR

Data

Training the Fed CLIP-ViL agent

Training Fed Envdrop agent

Training R2R

Download the Data

Train the Fed CLIP-ViL Agent

Augmented training

Training the Fed Envdrop agent

Fed CLIP-ViL pre-exploration

Fed Envdrop pre-exploration

Related Links

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages