A Video2Video framework for text2image models in ComfyUI. Supports SD1.5 and SDXL.
(TODO more examples)
See example_workflows
directory for SD15 and SDXL examples with notes.
vv_sdxl.mp4
vv_sd15.mp4
Install this repo from the ComfyUI manager or git clone the repo into custom_nodes
then pip install -r requirements.txt
within the cloned repo.
It is recommended to use Flow Attention through Unimatch (and others soon). To get Unimatch optical flow go to https://github.com/autonomousvision/unimatch/blob/master/MODEL_ZOO.md#optical-flow and select one of the models. It must be one of the regrefine larger models. Sintel and things versions tend to give the best results.
This node takes your diffusion model (SD15/SDXL) and alters the attentions, blocks, and resnets. The change is permanent to the model you input and will affect the model even when the output of the node is not used. However, Veevee will not activate unless using the VV Unsampler or VV Sampler.
This node coordinates unsampling the source video into unsampled noise.
- overlap: If you use this with AnimateDiff-Evolved's batching, this should match the context overlap otherwise you will get misconfigured injections.
- (optional) flow_config: This is the configuration to use flow attention, it is recommended.
- (optional) inj_config: This is the injection config to store information about the unsampling process that can be utilized during sampling. It is recommended.
- (optional) sampler: Use a sampler that's not in default ComfyUI.
This node coordinates sampling the unsampled noise into your target generation.
- overlap: If you use this with AnimateDiff-Evolved's batching, this should match the context overlap otherwise you will get misconfigured injections.
- sampler_name: The sampler to use if optional sampler is not specific.
- attn_injection_steps: The steps of attention to inject from the unsampling process. Can not be greater than the inj config's saved attention steps.
- res_injection_steps: The steps of res to inject from the unsampling process. Can not be greater than the inj config's saved res steps.
- (optional) flow_config: This is the configuration to use flow attention, it is recommended.
- (optional) inj_config: This is the injection config to utilize the stored information from unsampling. It must be the same config as the one used during unsampling. It is recommended.
- (optional) sca_config: This is to use Sparse Casual Attention. It can help is certain use cases.
- (optional) pivot_config: This is to use Pivot Attention.
- (optional) rave_config: This is to use Rave Attention.
- (optional) temporal_config: You can control the steps that AnimateDiff (AD) runs using this config. If you do not specify a config AD will run as normal.
- (optional) sampler: Use a sampler that's not in default ComfyUI.
This node sets up injection parameters and shares information between the unsampler and sampler. Must use the same node between the two.
- unsampler_sigmas: The sigmas used from scheduling the unsampler.
- sampler_sigmas: The sigmas used from scheduling the sampler.
- save_attn_steps: The amount of steps to save from the attention.
- save_res_steps: The amount of steps to save from the resnets.
This node calculates trajectories to guide Flow Attention.
- images: Your input video frames.
- checkpoint: The unimatch checkpoint that must be in
ComfyUI/models/unimatch
- flow_type: There are three flow types available. These must match your model.
- SD15 is the standard flow for SD15
- SD15_Full utilizes a stronger flow for SD15
- SDXL is the standard flow for SDXL
- direction: The direction to use in flow attention.
Utilize flow attention by configuring this for your unsampler/sampler.
- flow: The output from the Get Flow node.
- targets: Which parts of the UNet should utilize this attention.
- start_percent and end_percent are the step range
SCA Attention allows attentions to from specific frames to look at other frames.
- direction: The direction which the attentions can look in terms of video frames.
- targets: Which parts of the UNet should utilize this attention.
- start_percent and end_percent are the step range
Pivot attention is variant of the pivot mechanism in TokenFlow. It selects batches from the given frames and selects a "pivot" frame for each batch randomly. This frame is used to select styles for all frames within the batch.
- batch_size: The batch size to cut the frames into.
- seed: A random seed for selecting batch pivots.
- targets: Which parts of the UNet should utilize this attention.
- start_percent and end_percent are the step range
RAVE gridifies frames before applying the attention mechanism to allow styles to diffuse.
- grid_size: The length of the grid. For example 2 gives a 2x2 grid.
- seed: A random seed for selecting batch pivots.
- targets: Which parts of the UNet should utilize this attention.
- start_percent and end_percent are the step range
This node allows some more control over running AnimateDiff alongside Veevee. Note: AnimateDiff must be ran at a low effect multival if used.
- start_percent and end_percent are the step range
(TODO) FLATTEN FRESCO TokenFlow RAVE Video2Video-zero Unimatch FlowDiffuser CoTracker