GitHub - MrZihan/NavRAG: Official implementation of "NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM"

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Zihan Wang, Yaohui Zhu, Gim Hee Lee, Yachun Fan

Vision-and-Language Navigation (VLN) is an essential skill for embodied agents, allowing them to navigate in 3D environments following natural language instructions. High-performance navigation models require a large amount of training data, the high cost of manually annotating data has seriously hindered this field. Therefore, some previous methods translate trajectory videos into step-by-step instructions for expanding data, but such instructions do not match well with users' communication styles that briefly describe destinations or state specific needs. Moreover, local navigation trajectories overlook global context and high-level task planning. To address these issues, we propose NavRAG, a retrieval-augmented generation (RAG) framework that generates user demand instructions for VLN. NavRAG leverages LLM to build a hierarchical scene description tree for 3D scene understanding from global layout to local details, then simulates various user roles with specific demands to retrieve from the scene tree, generating diverse instructions with LLM. We annotate over 2 million navigation instructions across 861 scenes and evaluate the data quality and navigation performance of trained models.

arXiv Terabox or BaiduNetdisk

TODOs

Release the instruction generation code for MP3D and HM3D.
Release the DUET code for the NavRAG dataset and the REVERIE dataset.
Release NavRAG dataset and preprocessed feature files.
Release the checkpoints.
Release annotations of scene description tree for MP3D.

Requirements

Install the Matterport3D simulator for pre-training your model: follow the instructions here.

export PYTHONPATH=Matterport3DSimulator/build:$PYTHONPATH

Download the NavRAG dataset, preprocessed feature files, and checkpoints from TeraBox or Baidu Netdisk.
(Optional) Install the Habitat simulator and download Matterport3D scenes (MP3D) to obtain the RGB-D images: follow instructions here.
(Optional) Download the Habitat-Matterport 3D scenes (HM3D) from habitat-matterport-3dresearch.
```
hm3d-train-habitat-v0.2.tar
hm3d-val-habitat-v0.2.tar
```
(Optional) Input your OpenAI Key into instruction_generator/openai_key.json

Pre-train the DUET model on the NavRAG dataset

cd VLN-DUET-NAVRAG/pretrain_src
bash run_rag_h14.sh "0,1" 2345

Fine-tune or evaluate the DUET model on the NavRAG dataset

cd VLN-DUET-NAVRAG/map_nav_src
bash scripts/rag_h14_envedit_mix.sh "0,1" 2346

Fine-tune or evaluate the DUET model on the REVERIE dataset

cd VLN-DUET-RVR/map_nav_src
bash scripts/reverie_h14_envedit_mix.sh "0,1" 2346

(Optional) Generate the NavRAG dataset

Notice: Running the following code will result in OpenAI account charges.

cd instruction_generator
python3 get_mp3d_image.py # For HM3D scenes, get_hm3d_image.py
python3 get_viewpoint_summary.py
python3 get_zones.py
python3 get_house_summary.py
python3 generate_instruction.py
python3 convert_to_dataset.py

Citation

@article{wang2025navrag,
  title={NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM},
  author={Wang, Zihan and Zhu, Yaohui and Lee, Gim Hee and Fan, Yachun},
  journal={arXiv preprint arXiv:2502.11142},
  year={2025}
}

Acknowledgments

Our code is based on DUET, some code and data are from ScaleVLN and BEVBert. Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
VLN-DUET-NAVRAG		VLN-DUET-NAVRAG
VLN-DUET-RVR/map_nav_src		VLN-DUET-RVR/map_nav_src
instruction_generator		instruction_generator
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Zihan Wang, Yaohui Zhu, Gim Hee Lee, Yachun Fan

TODOs

Requirements

Pre-train the DUET model on the NavRAG dataset

Fine-tune or evaluate the DUET model on the NavRAG dataset

Fine-tune or evaluate the DUET model on the REVERIE dataset

(Optional) Generate the NavRAG dataset

Citation

Acknowledgments

About

Releases

Packages

Languages

MrZihan/NavRAG

Folders and files

Latest commit

History

Repository files navigation

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Zihan Wang, Yaohui Zhu, Gim Hee Lee, Yachun Fan

TODOs

Requirements

Pre-train the DUET model on the NavRAG dataset

Fine-tune or evaluate the DUET model on the NavRAG dataset

Fine-tune or evaluate the DUET model on the REVERIE dataset

(Optional) Generate the NavRAG dataset

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages