fix: CogVideox train dataset _preprocess_data crop video #9574

glide-the · 2024-10-03T08:29:41Z

Removed int8 to float32 conversion (* 2.0 - 1.0) from train_transforms as it caused image overexposure.
Added _resize_for_rectangle_crop function to enable video cropping functionality. The cropping mode can be configured via video_reshape_mode, supporting options: ['center', 'random', 'none'].

…orms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].

sayakpaul · 2024-10-03T08:52:20Z

Cc: @a-r-r-o-w

HuggingFaceDocBuilderDev · 2024-10-03T12:08:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2024-10-03T12:15:13Z

examples/cogvideo/train_cogvideox_lora.py

-            frames = frames.float()
-            frames = torch.stack([train_transforms(frame) for frame in frames], dim=0)
-            videos.append(frames.permute(0, 3, 1, 2).contiguous())  # [F, C, H, W]
+            tensor = frames.float() / 255.0


I am not sure why we are making the tensors to range [0, 1], instead of [-1, 1]. In the original codebase, we convert to [-1, 1] as well here if I understand correctly, yes?

I am not sure why we are making the tensors to range [0, 1], instead of [-1, 1]. In the original codebase, we convert to [-1, 1] as well here if I understand correctly, yes?

You're right, it should be in the [-1, 1] range. In fact, this is for matrix calculations during fine-tuning, and the [-1, 1] range is easier for computation. I forgot that this step is handled in the latten2img process, so the image is in the [0, 1] range, while the latent space is in the [-1, 1] range.

I've already verified that the cause of the training result showing a blank screen is that I input a 960x720 image into the dataset, and it was compressed to a 460x720 image for training directly.

https://github.com/THUDM/CogVideo/blob/111756a6a68a8df375ef9c31f9f325818699dfaa/sat/data_video.py#L437
The number 127.5 may experience precision loss during division operations.

encode : images / 255.0 * 2.0 - 1.0
decode: (images / 2 + 0.5).clamp(0, 1)

encode : (frames - 127.5) / 127.5
decode: (images / 2 + 0.5).clamp(0, 1)

a-r-r-o-w

Thanks, I have one question. Apart from that, this looks great! Could you also run make style?

cc @yiyixuxu if we want the ipynb notebook here or not

…ions.

a-r-r-o-w

Thanks for the improvements! Have you verified training on the new settings? I think it would be good to default --video_reshape_mode to maintain compatibility with existing *.sh scripts for launching training that others might have locally setup.

Could you host the ipynb notebook on https://gist.github.com/ instead of here, and link to it instead? We try to limit notebooks, pngs/mp4s/gifs and other files in the repo otherwise it can soon compound to a bulky clone.

I think make style is also needed to quality tests

glide-the · 2024-10-07T07:11:18Z

Thanks for the improvements! Have you verified training on the new settings? I think it would be good to default --video_reshape_mode to maintain compatibility with existing *.sh scripts for launching training that others might have locally setup.

Could you host the ipynb notebook on https://gist.github.com/ instead of here, and link to it instead? We try to limit notebooks, pngs/mp4s/gifs and other files in the repo otherwise it can soon compound to a bulky clone.

I think make style is also needed to quality tests

move video_fix_rgb_float_and_crop.ipynb to https://gist.github.com/glide-the/7658dbfd5f555be0a1a687a4139dba40

examples/cogvideo/README.md

…#9574) * Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none']. * The number 127.5 may experience precision loss during division operations. * wandb request pil image Type * Resizing bug * del jupyter * make style * Update examples/cogvideo/README.md * make style --------- Co-authored-by: --unset <--unset> Co-authored-by: Aryan <[email protected]>

* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none']. * The number 127.5 may experience precision loss during division operations. * wandb request pil image Type * Resizing bug * del jupyter * make style * Update examples/cogvideo/README.md * make style --------- Co-authored-by: --unset <--unset> Co-authored-by: Aryan <[email protected]>

a-r-r-o-w reviewed Oct 3, 2024

View reviewed changes

--unset added 2 commits October 3, 2024 21:39

The number 127.5 may experience precision loss during division operat…

d0f5b05

…ions.

wandb request pil image Type

10bf85f

a-r-r-o-w reviewed Oct 5, 2024

View reviewed changes

--unset and others added 2 commits October 6, 2024 15:23

Resizing bug

cdab2cf

del jupyter

ae94599

glide-the and others added 2 commits October 7, 2024 15:13

make style

8115e41

Merge branch 'main' into cogvideo_dataset_resize__crop

ba7bb57

a-r-r-o-w approved these changes Oct 8, 2024

View reviewed changes

examples/cogvideo/README.md Outdated Show resolved Hide resolved

a-r-r-o-w and others added 2 commits October 8, 2024 12:34

Update examples/cogvideo/README.md

84d1b32

make style

5178266

a-r-r-o-w merged commit 66eef9a into huggingface:main Oct 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CogVideox train dataset _preprocess_data crop video #9574

fix: CogVideox train dataset _preprocess_data crop video #9574

glide-the commented Oct 3, 2024

sayakpaul commented Oct 3, 2024

HuggingFaceDocBuilderDev commented Oct 3, 2024

a-r-r-o-w Oct 3, 2024 •

edited

Loading

glide-the Oct 3, 2024

glide-the Oct 3, 2024 •

edited

Loading

a-r-r-o-w left a comment •

edited

Loading

a-r-r-o-w left a comment

glide-the commented Oct 7, 2024 •

edited

Loading

fix: CogVideox train dataset _preprocess_data crop video #9574

fix: CogVideox train dataset _preprocess_data crop video #9574

Conversation

glide-the commented Oct 3, 2024

sayakpaul commented Oct 3, 2024

HuggingFaceDocBuilderDev commented Oct 3, 2024

a-r-r-o-w Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

glide-the Oct 3, 2024

Choose a reason for hiding this comment

glide-the Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

a-r-r-o-w left a comment • edited Loading

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

glide-the commented Oct 7, 2024 • edited Loading

a-r-r-o-w Oct 3, 2024 •

edited

Loading

glide-the Oct 3, 2024 •

edited

Loading

a-r-r-o-w left a comment •

edited

Loading

glide-the commented Oct 7, 2024 •

edited

Loading