Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Mochi T2V #9769

Merged
merged 79 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
e488d09
update
a-r-r-o-w Oct 23, 2024
64275b0
udpate
a-r-r-o-w Oct 23, 2024
da48940
update transformer
a-r-r-o-w Oct 23, 2024
05ebd6c
make style
a-r-r-o-w Oct 23, 2024
0e9e281
fix
a-r-r-o-w Oct 24, 2024
c2a1557
add conversion script
a-r-r-o-w Oct 24, 2024
be5bbe5
update
a-r-r-o-w Oct 24, 2024
1e9bc91
fix
a-r-r-o-w Oct 24, 2024
98a4554
update
a-r-r-o-w Oct 24, 2024
85c8734
fix
a-r-r-o-w Oct 24, 2024
ccc1b36
update
DN6 Oct 24, 2024
2fd2ec4
fixes
a-r-r-o-w Oct 24, 2024
46f95d5
make style
a-r-r-o-w Oct 24, 2024
275041d
update
DN6 Oct 24, 2024
c12ce7d
Merge branch 'mochi-t2v' into mochi-t2v-pipeline
DN6 Oct 24, 2024
44987ad
update
DN6 Oct 24, 2024
ebcbad2
update
DN6 Oct 24, 2024
85a9825
init
yiyixuxu Oct 24, 2024
8700d64
update
DN6 Oct 24, 2024
969c3ab
update
DN6 Oct 24, 2024
0a6189e
add
yiyixuxu Oct 24, 2024
e9e92d0
up
yiyixuxu Oct 25, 2024
0b76fea
up
yiyixuxu Oct 25, 2024
7c55ef5
up
yiyixuxu Oct 25, 2024
6552653
update
DN6 Oct 25, 2024
a7372bd
mochi transformer
a-r-r-o-w Oct 25, 2024
3e569cb
Merge branch 'mochi-vae' into mochi
a-r-r-o-w Oct 25, 2024
723329d
Merge branch 'mochi-t2v-pipeline' into mochi
a-r-r-o-w Oct 25, 2024
ba9f13f
remove original implementation
a-r-r-o-w Oct 25, 2024
c916ae5
make style
a-r-r-o-w Oct 25, 2024
d41198c
update inits
a-r-r-o-w Oct 25, 2024
2798ed4
update conversion script
a-r-r-o-w Oct 25, 2024
5f43c6a
docs
a-r-r-o-w Oct 25, 2024
8e11f34
Update src/diffusers/pipelines/mochi/pipeline_mochi.py
a-r-r-o-w Oct 25, 2024
fb2ede0
Update src/diffusers/pipelines/mochi/pipeline_mochi.py
a-r-r-o-w Oct 25, 2024
237e079
fix docs
a-r-r-o-w Oct 25, 2024
72741ec
pipeline fixes
a-r-r-o-w Oct 25, 2024
cae2801
make style
a-r-r-o-w Oct 25, 2024
5d093fe
invert sigmas in scheduler; fix pipeline
a-r-r-o-w Oct 25, 2024
5925844
fix pipeline num_frames
a-r-r-o-w Oct 25, 2024
303b47c
Merge branch 'main' into mochi
a-r-r-o-w Oct 25, 2024
2682d1f
flip proj and gate in swiglu
a-r-r-o-w Oct 26, 2024
99d2847
make style
a-r-r-o-w Oct 26, 2024
e86f91e
fix
a-r-r-o-w Oct 26, 2024
b88f66e
make style
a-r-r-o-w Oct 26, 2024
172fdde
fix tests
a-r-r-o-w Oct 26, 2024
b5d7679
latent mean and std fix
a-r-r-o-w Oct 26, 2024
d9c7956
update
a-r-r-o-w Oct 26, 2024
3c53af2
cherry-pick 1069d210e1b9e84a366cdc7a13965626ea258178
yiyixuxu Oct 26, 2024
346aed3
Merge branch 'main' into mochi
a-r-r-o-w Oct 28, 2024
e736094
remove additional sigma already handled by flow match scheduler
a-r-r-o-w Oct 28, 2024
e66404c
fix
a-r-r-o-w Oct 28, 2024
0f4e9c4
remove hardcoded value
a-r-r-o-w Oct 28, 2024
8a5a2dd
replace conv1x1 with linear
a-r-r-o-w Oct 28, 2024
69e5db6
Update src/diffusers/pipelines/mochi/pipeline_mochi.py
a-r-r-o-w Oct 29, 2024
b5f18b8
Merge branch 'main' into mochi
a-r-r-o-w Oct 29, 2024
8186323
Merge branch 'mochi' into mochi-vae-framewise-tiling
a-r-r-o-w Oct 29, 2024
7204b58
framewise decoding and conv_cache
a-r-r-o-w Oct 29, 2024
4b966d2
make style
a-r-r-o-w Oct 29, 2024
6239a6a
Merge branch 'main' into mochi
a-r-r-o-w Oct 29, 2024
d791908
Apply suggestions from code review
a-r-r-o-w Nov 1, 2024
387aafc
mochi vae encoder changes
a-r-r-o-w Nov 1, 2024
f4cbbfc
Merge branch 'main' into mochi
a-r-r-o-w Nov 1, 2024
a1ad818
rebase correctly
a-r-r-o-w Nov 1, 2024
cc11752
Update scripts/convert_mochi_to_diffusers.py
a-r-r-o-w Nov 1, 2024
a05b85c
fix tests
a-r-r-o-w Nov 2, 2024
b998ff4
fixes
a-r-r-o-w Nov 2, 2024
07dfbc7
make style
a-r-r-o-w Nov 2, 2024
1ff17b1
Merge branch 'main' into mochi
a-r-r-o-w Nov 2, 2024
3271d55
update
a-r-r-o-w Nov 4, 2024
8c06092
make style
a-r-r-o-w Nov 4, 2024
e6bb7e4
update
yiyixuxu Nov 5, 2024
ca5c7f0
add framewise and tiled encoding
a-r-r-o-w Nov 5, 2024
abc8c5e
make style
a-r-r-o-w Nov 5, 2024
3c1d992
make original vae implementation behaviour the default; note: framewi…
a-r-r-o-w Nov 5, 2024
a722340
remove framewise encoding implementation due to presence of attn layers
a-r-r-o-w Nov 5, 2024
55b0bc3
Merge branch 'main' into mochi
a-r-r-o-w Nov 5, 2024
995938f
fight test 1
a-r-r-o-w Nov 5, 2024
6db5083
fight test 2
a-r-r-o-w Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,8 @@
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/pixart_transformer2d
title: PixArtTransformer2DModel
- local: api/models/prior_transformer
Expand Down Expand Up @@ -302,6 +304,8 @@
title: AutoencoderKL
- local: api/models/autoencoderkl_cogvideox
title: AutoencoderKLCogVideoX
- local: api/models/autoencoderkl_mochi
title: AutoencoderKLMochi
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/consistency_decoder_vae
Expand Down Expand Up @@ -394,6 +398,8 @@
title: Lumina-T2X
- local: api/pipelines/marigold
title: Marigold
- local: api/pipelines/mochi
title: Mochi
- local: api/pipelines/panorama
title: MultiDiffusion
- local: api/pipelines/musicldm
Expand Down
32 changes: 32 additions & 0 deletions docs/source/en/api/models/autoencoderkl_mochi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLMochi

The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLMochi

vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```

## AutoencoderKLMochi

[[autodoc]] AutoencoderKLMochi
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
30 changes: 30 additions & 0 deletions docs/source/en/api/models/mochi_transformer3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# MochiTransformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.

The model can be loaded with the following code snippet.

```python
from diffusers import MochiTransformer3DModel

vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
```

## MochiTransformer3DModel

[[autodoc]] MochiTransformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
36 changes: 36 additions & 0 deletions docs/source/en/api/pipelines/mochi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# Mochi

[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.

*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.

</Tip>

## MochiPipeline

[[autodoc]] MochiPipeline
- all
- __call__

## MochiPipelineOutput

[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput
Loading
Loading