- [2024.11.27] 🔥🔥🔥 We have published our report, which provides comprehensive training details and includes additional experiments.
- [2024.11.25] 🔥🔥🔥 We have released our 16-channel WF-VAE-L model along with the training code. Welcome to download it from Huggingface.
WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.
- This architecture substantially improves speed and reduces training costs in large-scale video generation models and data processing workflows.
- Our experiments demonstrate competitive performance of our model against SOTA VAEs.
WF-VAE | CogVideoX |
---|---|
We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.
git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt
To reconstruct a video or an image, execute the following commands:
CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
--model_name WFVAE \
--from_pretrained "Your VAE" \
--video_path "Video Path" \
--rec_path rec.mp4 \
--device cuda \
--sample_rate 1 \
--num_frames 65 \
--height 512 \
--width 512 \
--fps 30 \
--enable_tiling
CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
--model_name WFVAE \
--from_pretrained "Your VAE" \
--image_path assets/gt_5544.jpg \
--rec_path rec.jpg \
--device cuda \
--short_size 512
For further guidance, refer to the example scripts: examples/rec_single_video.sh
and examples/rec_single_image.sh
.
The training & validating instruction is in TRAIN_AND_VALIDATE.md.
- Open-Sora Plan - https://github.com/PKU-YuanGroup/Open-Sora-Plan
- Allegro - https://github.com/rhymes-ai/Allegro
- CogVideoX - https://github.com/THUDM/CogVideo
- Stable Diffusion - https://github.com/CompVis/stable-diffusion
This project is released under the Apache 2.0 license as found in the LICENSE file.