Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image-To-Image Pipeline #3

Merged
merged 1 commit into from
Sep 28, 2023
Merged

Add Image-To-Image Pipeline #3

merged 1 commit into from
Sep 28, 2023

Conversation

jdp8
Copy link
Contributor

@jdp8 jdp8 commented Sep 11, 2023

Description

The changes include adding the Image-To-Image pipeline, along with the input fields for the Text-To-Image and Image-To-Image pipelines such as the Seed, Guidance Scale, Input Image and Strength. The Image-To-Image pipeline is basically the same as the Text-To-Image pipeline except that instead of having a random latent as input, a noisy input image is used as the input latent.

Specific Changes

  • Add input elements to App.tsx file.
  • Add a function to get the RGB data from an image to App.tsx.
  • Add a function to upload an image, resize it to (512,512) and call the function to get the RGB data. Added to App.tsx.
  • Add a function called encodeImage(input_image) that uses the VAE Encoder to encode the input image from image space to latent space. The function was added to StableDiffusionPipeline.ts. The code was taken from the pil_to_latent(input_im) function shown here.
  • Add a function to add noise to the input image called add_noise(). Added to the PNDMScheduler.ts file. The code was taken from here.
  • Add the seedrandom package to the package.json file in order to use a seedable Random Number Generator (RNG).
  • Modify the randomNormal() and randomNormalTensor() functions in the Tensor.ts file to accept the input seed and the RNG.
  • Call the encodeImage() and add_noise() functions and add a modification to the img2img timesteps depending on the strength and inference steps. Added to the StableDiffusionPipeline.ts file. Code taken from here.

Issues

To make the project work successfully, there were other slight changes done:

  • Downloaded the models locally following the steps mentioned here.
  • Change the homepage in the package.json file to "."
  • Uncomment the images argument from the runInference() function in the App.tsx file.

@dakenf
Copy link
Owner

dakenf commented Sep 11, 2023

Wow, thanks. I'm a bit slow with the library update but i think it was worth it, i got one unet step to be less than 1 second on 3090, you can see results here microsoft/onnxruntime#17373 (comment) Also works on m1 mac and should be around 2sec for a step

I need to do a few fixes on ONNX and then will publish the update. Then this one will be actually usable and will change into diffusers.js

Will review the changes later today

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 12, 2023

No worries. That UNET speedup is very impressive, thank you for that! Perfect, I'll wait for that update. In addition, I will close the issue that I had since I was able to run the project successfully.

Let me know if there are any issues with my changes. Thank you!

@dakenf
Copy link
Owner

dakenf commented Sep 13, 2023

@jdp8 i've updated @aislamov/onnxruntime-web64 to version 1.0.0 and now it should work very fast
you'll need chrome canary 119.0.6006.0 (i think it released today)

change this line

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), { executionProviders: ['wasm'] })

to

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), sessionOption)

if you are using windows and don't have windows sdk installed, you might need to download this https://github.com/microsoft/DirectXShaderCompiler and put dxcompiler.dll to google chrome dir. Then launch chome with this command line flag --enable-dawn-features=allow_unsafe_apis,use_dxc
On mac or linux you don't need to download anything and flag would be --enable-dawn-features=allow_unsafe_apis

if it will crash with device lost error then it means there's not enough VRAM, try using text encoder and/or VAE on cpu

let me know if something does not work

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 14, 2023

@dakenf I tried to run it with your changes but I got this error when trying to download the UNET model:

Screenshot 2023-09-14 at 10 44 51 AM

Errors in the console:

Screenshot 2023-09-14 at 10 57 25 AM

These are the steps that I followed, please let me know if I'm missing something:

  1. Updated the @aislamov/onnxruntime-web64 to the 1.0.0 version.
  2. Changed the UNET executionProvider to 'webgpu'.
  3. Updated Chrome Canary (currently I have Version 119.0.6008.0) and launched it with the mentioned flag.

I'm using an M1 Pro (16 GB) and the models are still being downloaded from the public/models/aislamov/stable-diffusion-2-1-base-onnx directory with the 0799d8182f3385a6acf4ea06eea98c39c007f5c7 commit. This was also attempted on an M2 MAX (32 GB) and it failed with the same error. In addition, I changed the UNET executionProvider back to 'wasm' and the download of the VAE Decoder failed.

@dakenf
Copy link
Owner

dakenf commented Sep 14, 2023

Hmm. Let me do some more tests and update the model in reposotory

@dakenf
Copy link
Owner

dakenf commented Sep 18, 2023

@jdp8 can you try changing the first line in StableDiffusionPipeline to

import { InferenceSession } from '@aislamov/onnxruntime-web64/webgpu';

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 19, 2023

I changed the first line like you mentioned and the models were loaded successfully but when I ran the model, it showed a black image on the canvas. Numerous errors were shown in the console. I'll attach a screenshot and a log file of the errors.

Screenshot 2023-09-19 at 1 06 48 PM

WebGPU Console Errors.log

@dakenf
Copy link
Owner

dakenf commented Sep 19, 2023

Ok, i've just published @aislamov/onnxruntime-web64 1.0.1
It should resolve the issue, let me know if that won't work

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 19, 2023

Did a few tests with Text-To-Image and it works like a charm! I ran it for 20 steps with various prompts using the VAE only on the last step and the entire process took ~2 minutes on an M1 Pro. That's a remarkable speedup compared to before. Excellent work, thank you so much for this!

On another note I tried to run the Image-To-Image pipeline but the process failed when trying to encode the input image, specifically running the VAE Encoder model with 'webgpu' as the executionProvider. Maybe this could be due to an unsupported operation or something similar? I'll attach what appears when the code reaches the point of encoding the input image.

Screenshot 2023-09-19 at 4 54 42 PM

@dakenf
Copy link
Owner

dakenf commented Sep 19, 2023

Most likely it's out of memory issue. Try these

  1. make sure you run fp16 version of vae encoder (i think i havent included it on huggingface)
  2. try loading it separately to get latents and then call .release() (don't remember exact method) and then pass it to pipeline

i'm now working on VRAM usage reduction since it uses about 10gb in current release

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 21, 2023

I'm running the VAE Encoder that's in your Hugging Face repo, specifically this one from the 'Initial fp16 ONNX commit'. The strange thing is that it doesn't fail when using the 'wasm' backend but fails when using the 'webgpu' backend.

I tried to use two methods separately to release or dispose the text_encoder and vae_encoder sessions after using them in order to free up memory but they didn't prevent the error from occurring. The methods that I used were await this.vae_encoder.release() and await this.vae_encoder.handler.dispose(). I didn't find any official documentation for these methods. The only references that I found were this issue and this issue.

@dakenf
Copy link
Owner

dakenf commented Sep 28, 2023

I'm going to merge it and then do some testing

@dakenf dakenf merged commit 3c95dbc into dakenf:main Sep 28, 2023
@dakenf
Copy link
Owner

dakenf commented Sep 29, 2023

I've updated the model on huggingface and onnxruntime package. It now takes 1 second for a step on m1 max and image2image works fine

I will do some refactoring on the weekend to change this repo to diffusers.js library
Next steps would be adding controlnet, more efficient scheduler and SDXL support

@jdp8
Copy link
Contributor Author

jdp8 commented Sep 29, 2023

Just tested it and it generated the image in less than a minute on M1 Pro. Thank you!

Awesome, that's great news! I'll be on the lookout to see if I can help with anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants