- 2025-01-25: Paper, project page, code, data, envs and models are all released.
This work presents TOWARDS REALISTIC UAV VISION-LANGUAGE NAVIGATION: PLATFORM, BENCHMARK, AND METHODOLOGY. We introduce a UAV simulation platform, an assistant-guided realistic UAV VLN benchmark, and an MLLM-based method to address the challenges in realistic UAV vision-language navigation.
conda create -n llamauav python=3.10 -y
conda activate llamauav
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
You can follow LLaMA-UAV to install the llm dependencies.
pip install -r requirement.txt
Additionally, to ensure compatibility with the AirSim Python API, apply the fix mentioned in the AirSim issue
To prepare the dataset, please follow the instructions provided in the Dataset Section to construct the dataset.
To set up the model, refer to to the detailed Model Setup.
Download the simulator environments for various maps from here.
- setup simulator env server
Before running the simulations, ensure the AirSim environment server is properly configured.
Update the env executable paths
env_exec_path_dict
relative toroot_path
inAirVLNSimulatorServerTool.py
.
cd airsim_plugin
python AirVLNSimulatorServerTool.py --port 30000 --root_path /path/to/your/envs
- run close-loop simulation
Once the simulator server is running, you can execute the dagger or evaluation script.
# Dagger NYC
bash scripts/dagger_NYC.sh
# Eval
bash scripts/eval.sh
bash scripts/metrics.sh
If you find this project useful, please consider citing: paper:
@misc{wang2024realisticuavvisionlanguagenavigation,
title={Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology},
author={Xiangyu Wang and Donglin Yang and Ziqin Wang and Hohin Kwan and Jinyu Chen and Wenjun Wu and Hongsheng Li and Yue Liao and Si Liu},
year={2024},
eprint={2410.07087},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.07087},
}
This repository is partly based on AirVLN and LLaMA-VID repositories.