Skip to content

Latest commit

 

History

History

datasets

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Dataset Preparation for CamI2V

This repo contains preparation guide of RealEstate10K to train our camera controllable diffusion model CamI2V.

Download Metadata

wget https://storage.cloud.google.com/realestate10k-public-files/RealEstate10K.tar.gz
tar -xvzf RealEstate10K.tar.gz -C datasets
mkdir -p datasets/RealEstate10K/pose_files
mv datasets/RealEstate10K/test datasets/RealEstate10K/pose_files/
mv datasets/RealEstate10K/train datasets/RealEstate10K/pose_files/

Download Videos

You may need pip install pytubefix to run this script. By default, it will try to download the highest resolution if available, you can change this behaviour at line 103.

python datasets/utils/generate_dataset.py --split "test"

Extract Video Clips

python datasets/utils/gather_realestate.py --split "test"
python datasets/utils/get_realestate_clips.py --split "test"

Prepare Annotations

We use caption annotations generated by CameraCtrl. Please download and put 2 json files under RealEstate10K folder.

python datasets/utils/preprocess_realestate.py --split "test"

All-in-one Script

bash datasets/preprocess.sh "test"

The final file structure would be like

─┬─ datasets\
 └─┬─ RealEstate10K\
   ├─┬─ pose_files\
   │ └─── test\
   ├─┬─ valid_meta\
   │ └─── test\
   ├─┬─ video_clips\
   │ └─── test\
   ├─┬─ videos\
   │ └─── test\
   ├─── test_captions.json
   ├─── test_video2clip.json
   ├─── test_list_data.pkl
   └─── test_valid_list.txt

The pre-process for train split is the same as test.