Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CogVideox train dataset _preprocess_data crop video #9574

Merged
merged 9 commits into from
Oct 8, 2024

Conversation

glide-the
Copy link
Contributor

  • Removed int8 to float32 conversion (* 2.0 - 1.0) from train_transforms as it caused image overexposure.

  • Added _resize_for_rectangle_crop function to enable video cropping functionality. The cropping mode can be configured via video_reshape_mode, supporting options: ['center', 'random', 'none'].

image

@sayakpaul

…orms` as it caused image overexposure.

Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].
@sayakpaul
Copy link
Member

Cc: @a-r-r-o-w

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

frames = frames.float()
frames = torch.stack([train_transforms(frame) for frame in frames], dim=0)
videos.append(frames.permute(0, 3, 1, 2).contiguous()) # [F, C, H, W]
tensor = frames.float() / 255.0
Copy link
Member

@a-r-r-o-w a-r-r-o-w Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we are making the tensors to range [0, 1], instead of [-1, 1]. In the original codebase, we convert to [-1, 1] as well here if I understand correctly, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we are making the tensors to range [0, 1], instead of [-1, 1]. In the original codebase, we convert to [-1, 1] as well here if I understand correctly, yes?

You're right, it should be in the [-1, 1] range. In fact, this is for matrix calculations during fine-tuning, and the [-1, 1] range is easier for computation. I forgot that this step is handled in the latten2img process, so the image is in the [0, 1] range, while the latent space is in the [-1, 1] range.

I've already verified that the cause of the training result showing a blank screen is that I input a 960x720 image into the dataset, and it was compressed to a 460x720 image for training directly.

Copy link
Contributor Author

@glide-the glide-the Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/THUDM/CogVideo/blob/111756a6a68a8df375ef9c31f9f325818699dfaa/sat/data_video.py#L437
The number 127.5 may experience precision loss during division operations.

encode : images / 255.0 * 2.0 - 1.0
decode: (images / 2 + 0.5).clamp(0, 1)
image

encode : (frames - 127.5) / 127.5
decode: (images / 2 + 0.5).clamp(0, 1)
image

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have one question. Apart from that, this looks great! Could you also run make style?

cc @yiyixuxu if we want the ipynb notebook here or not

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements! Have you verified training on the new settings? I think it would be good to default --video_reshape_mode to maintain compatibility with existing *.sh scripts for launching training that others might have locally setup.

Could you host the ipynb notebook on https://gist.github.com/ instead of here, and link to it instead? We try to limit notebooks, pngs/mp4s/gifs and other files in the repo otherwise it can soon compound to a bulky clone.

I think make style is also needed to quality tests

@glide-the
Copy link
Contributor Author

glide-the commented Oct 7, 2024

Thanks for the improvements! Have you verified training on the new settings? I think it would be good to default --video_reshape_mode to maintain compatibility with existing *.sh scripts for launching training that others might have locally setup.

Could you host the ipynb notebook on https://gist.github.com/ instead of here, and link to it instead? We try to limit notebooks, pngs/mp4s/gifs and other files in the repo otherwise it can soon compound to a bulky clone.

I think make style is also needed to quality tests

move video_fix_rgb_float_and_crop.ipynb to https://gist.github.com/glide-the/7658dbfd5f555be0a1a687a4139dba40

examples/cogvideo/README.md Outdated Show resolved Hide resolved
@a-r-r-o-w a-r-r-o-w merged commit 66eef9a into huggingface:main Oct 8, 2024
8 checks passed
leisuzz pushed a commit to leisuzz/diffusers that referenced this pull request Oct 11, 2024
…#9574)

* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure.

Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].

* The number 127.5 may experience precision loss during division operations.

* wandb request pil image Type

* Resizing bug

* del jupyter

* make style

* Update examples/cogvideo/README.md

* make style

---------

Co-authored-by: --unset <--unset>
Co-authored-by: Aryan <[email protected]>
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure.

Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].

* The number 127.5 may experience precision loss during division operations.

* wandb request pil image Type

* Resizing bug

* del jupyter

* make style

* Update examples/cogvideo/README.md

* make style

---------

Co-authored-by: --unset <--unset>
Co-authored-by: Aryan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants