IndexError on num_inference_steps values: 3, 9, 27, ...? #444

sausix · 2022-09-09T11:00:54Z

Describe the bug

Some num_inference_steps values seem to be "problematic" and throw an IndexError.

Also the iteration status output does not end on that value in any relation. If it's on purpose, it's still confusing:

num_inference_steps: 5
6it [00:49, 8.27s/it]

num_inference_steps: 6
8it [01:02, 7.87s/it]

num_inference_steps: 7
9it [01:10, 7.88s/it]

num_inference_steps: 8
9it [01:13, 8.19s/it]

Reproduction

from diffusers import StableDiffusionPipeline

TOKEN = "(censored)"
MODEL_ID = "CompVis/stable-diffusion-v1-4"

pipe = StableDiffusionPipeline.from_pretrained(MODEL_ID, use_auth_token=TOKEN)
res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=3, guidance_scale=7.5)

Tested on command line too. Unrelated to PyCharm environment.

Logs

/usr/bin/python3 /home/as/.config/JetBrains/PyCharm2022.2/scratches/scratch.py 

0it [00:07, ?it/s]
Traceback (most recent call last):
  File "/home/as/.config/JetBrains/PyCharm2022.2/scratches/scratch.py", line 7, in <module>
    res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=3, guidance_scale=7.5)
  File "/home/as/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 148, in __call__
    latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs)["prev_sample"]
  File "/usr/lib/python3.10/site-packages/diffusers/schedulers/scheduling_pndm.py", line 136, in step
    return self.step_plms(model_output=model_output, timestep=timestep, sample=sample)
  File "/usr/lib/python3.10/site-packages/diffusers/schedulers/scheduling_pndm.py", line 212, in step_plms
    prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output)
  File "/usr/lib/python3.10/site-packages/diffusers/schedulers/scheduling_pndm.py", line 230, in _get_prev_sample
    alpha_prod_t = self.alphas_cumprod[timestep + 1 - self._offset]
IndexError: index 1000 is out of bounds for dimension 0 with size 1000

System Info

diffusers version: 0.4.0.dev0
Platform: Linux-5.19.7-arch1-1-x86_64-with-glibc2.36
Python version: 3.10.6
PyTorch version (GPU?): 1.12.1+cu102 (True)
Huggingface_hub version: 0.9.1
Transformers version: 4.21.3
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

The text was updated successfully, but these errors were encountered:

jonatanklosko · 2022-09-09T12:38:27Z

I believe this is the issue well summarized in this comment.

The original bug comes from the DDIMScheduler:

>>> from diffusers import DDIMScheduler
>>> scheduler = DDIMScheduler()
>>> scheduler.set_timesteps(3, offset=1)
>>> scheduler.timesteps
tensor([1000,  667,  334,    1])

Note how the number of steps is not 3 as we specify.

The same off-by-1 issue is present in PNDMScheduler:

>>> from diffusers import PNDMScheduler
>>> scheduler = PNDMScheduler(skip_prk_steps=True)
>>> scheduler.set_timesteps(3, offset=1)
>>> scheduler.timesteps
[1000, 667, 667, 334, 1]

The expected number of steps is 4, and not 5. Sidenote: it is a bit confusing that set_timesteps(3) should result in 4 timesteps, but that's because it's actualy set_timesteps(num_ddim_timesteps).

jonatanklosko · 2022-09-09T12:39:11Z

A loosely related question: why do we need the offset option? As far as I can see, the original implementation doesn't have a configurable offset and it is assumed to be 1. Also, in the stable diffusion pipeline it is always set to 1:

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Lines 220 to 223 in 5adb0a7

    
           accepts_offset = "offset" in set(inspect.signature(self.scheduler.set_timesteps).parameters.keys()) 
        
           extra_set_kwargs = {} 
        
           if accepts_offset: 
        
               extra_set_kwargs["offset"] = 1

And it's not clear from the docs :D

diffusers/src/diffusers/schedulers/scheduling_pndm.py

Line 141 in 5adb0a7

offset (`int`): TODO

sausix · 2022-09-09T13:25:37Z

I believe this is the issue well summarized in this comment.

I am sorry for opening duplicate report. Did not see that. I made a search for "num_inference_steps", not for "ddim_steps". But searching now for "num_inference_steps" also does not show this thread.

jonatanklosko · 2022-09-09T13:26:49Z

@sausix the issue I referenced is in another repository, so it's all good :)

natolambert · 2022-09-10T20:52:00Z

Thanks for all the details on this, I'm going to take a look.

natolambert · 2022-09-10T20:53:33Z

@jonatanklosko I wrote the scheduler docs and saw the offset parameter in multiple schedulers. When I worked on them more it wasn't there, so I need to do more reading before writing accurate docs, so I'll look at this too (maybe we should make another issue for it, depending on how fast all these can be solved).

natolambert · 2022-09-10T21:40:56Z

Okay, I've tracked down what offset is for, seems like it's used to make the final timestep of inference in stable diffusion 1 rather than zero (which you said). The option is just a more elegant approach that makes it feature proof, but I see your point that making a bool may be better (if you want that, make a new issue).

You can see where it is used in diffusers with this search (the stable diffusion pipelines). This is matching the stable diffusion implementation here (with a weird comment from the authors).

Now for the harder one, checking the index error. I remember encountering this in the past, I'll likely add a test to the schedulers to cover this.

jonatanklosko · 2022-09-10T22:06:02Z

@natolambert yeah, my confusion was mostly that in the original implementation it is set to 1 and same in the stable diffusion pipeline, so I wondered if offset=0 is ever used. Now I looked for set_timesteps and looks like in most cases offset is not specified.

Interestingly in CompVis/latent-diffusion they also use offset of 1 (ref), which is not the case in the latent diffusion pipeline (ref), not sure if that's expected or not.

Another possible inconsistency I found is this:

diffusers/src/diffusers/schedulers/scheduling_pndm.py

Lines 338 to 339 in 8eaaa54

    
           alpha_prod_t = self.alphas_cumprod[timestep + 1 - self._offset] 
        
           alpha_prod_t_prev = self.alphas_cumprod[timestep_prev + 1 - self._offset]

diffusers/src/diffusers/schedulers/scheduling_ddim.py

Lines 204 to 205 in 8eaaa54

    
           alpha_prod_t = self.alphas_cumprod[timestep] 
        
           alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod

It's equivalent for offset=1, but not for offset=0

natolambert · 2022-09-10T22:07:06Z

Yeah I just realized the 1 vs int difference. I'm not sure, if you want more on that open another issue @jonatanklosko, it's better for the folks who added that to decide (I don't have a strong opinion).

natolambert · 2022-09-10T22:10:50Z

@patrickvonplaten @patil-suraj, I'm not sure about the discrepancy @jonatanklosko brings up above with the indexing and offset.

I vaguely remember there being some of the PNDM tests failing a while back confusing people. Maybe it was because this change was introduced? I'll keep looking a bit.

natolambert · 2022-09-10T22:14:23Z

@jonatanklosko I can tell you that the PNDMScheduler and DDIMScheduler have indexed the alpha's differently for a while. Before introducing offset, PNDMScheduler had

alpha_prod_t = self.alphas_cumprod[timestep + 1]
alpha_prod_t_prev = self.alphas_cumprod[timestep_prev + 1]

And DDIMScheduler had

alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep_prev]

Though I'm not sure why DDIMScheduler isn't shifted by the offset.

jonatanklosko · 2022-09-10T22:18:04Z

@natolambert sure I will!

I'm also looking through openai/glide-text2im and I'm almost sure they don't use the 1 offset there, so maybe that's the reason for making it configurable.

natolambert · 2022-09-10T22:22:06Z

I opened #465 to keep this on track about the indexing error / confusion.

jonatanklosko · 2022-09-10T22:24:34Z

@natolambert perfect, thank you!

natolambert · 2022-09-11T22:19:35Z

Okay the power of 3 thing seems fixed in #466, I need to check the number of iterations though before merged.

@sausix

natolambert · 2022-09-11T22:34:48Z

Okay, looks good to me, will close this issue as we will merge #466 shortly.

>>> res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=3, guidance_scale=7.5)
100%|██████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.02s/it]
>>> res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=5, guidance_scale=7.5)
100%|██████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.36s/it]
>>> res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=6, guidance_scale=7.5)
100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00,  1.40s/it]
>>> res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=7, guidance_scale=7.5)
100%|██████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00,  1.44s/it]
>>> res = pipe("a photo of an astronaut riding a horse on mars", num_inference_steps=9, guidance_scale=7.5)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.48s/it]

sausix · 2022-09-11T22:35:31Z

@natolambert Thank for fixing quickly.
Will test it tommorrow too. Found some other non 3 related iterations yesterday.

patil-suraj · 2022-09-12T08:39:27Z

@natolambert yeah, my confusion was mostly that in the original implementation it is set to 1 and same in the stable diffusion pipeline, so I wondered if offset=0 is ever used. Now I looked for set_timesteps and looks like in most cases offset is not specified.

Interestingly in CompVis/latent-diffusion they also use offset of 1 (ref), which is not the case in the latent diffusion pipeline (ref), not sure if that's expected or not.

Another possible inconsistency I found is this:

diffusers/src/diffusers/schedulers/scheduling_pndm.py

Lines 338 to 339 in 8eaaa54

alpha_prod_t = self.alphas_cumprod[timestep + 1 - self._offset]

alpha_prod_t_prev = self.alphas_cumprod[timestep_prev + 1 - self._offset]

diffusers/src/diffusers/schedulers/scheduling_ddim.py

Lines 204 to 205 in 8eaaa54

alpha_prod_t = self.alphas_cumprod[timestep]

alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod

It's equivalent for offset=1, but not for offset=0

The offsetting is very specific to latnet/stable diffusion model, it's not used by other models like the original ddpm models.
Also we should indeed update the latent diffusion pipeline to add the offset there. Feel free to open a PR if you want :)

jonatanklosko · 2022-09-12T11:27:03Z

@patil-suraj I'm happy to send a PR, but it seems to me that offset would make more sense as scheduler configuration field, rather than argument to set_timestamps, as mentioned in #465 (comment), wdyt?

patrickvonplaten · 2022-09-12T12:30:59Z

Let's close the issue only when the PR is in main :-)

sausix added the bug Something isn't working label Sep 9, 2022

natolambert mentioned this issue Sep 10, 2022

Scheduler docs update #464

Merged

natolambert mentioned this issue Sep 10, 2022

Discrepancy in scheduler offset usage #465

Closed

natolambert mentioned this issue Sep 10, 2022

Fix scheduler inference steps error with power of 3 #466

Merged

natolambert closed this as completed Sep 11, 2022

patrickvonplaten reopened this Sep 12, 2022

natolambert mentioned this issue Sep 12, 2022

off-by-one error in pndm scheduler #477

Closed

patrickvonplaten closed this as completed Sep 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError on num_inference_steps values: 3, 9, 27, ...? #444

IndexError on num_inference_steps values: 3, 9, 27, ...? #444

sausix commented Sep 9, 2022

jonatanklosko commented Sep 9, 2022

jonatanklosko commented Sep 9, 2022

sausix commented Sep 9, 2022

jonatanklosko commented Sep 9, 2022

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022 •

edited

Loading

jonatanklosko commented Sep 10, 2022

natolambert commented Sep 10, 2022 •

edited

Loading

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022 •

edited

Loading

jonatanklosko commented Sep 10, 2022 •

edited

Loading

natolambert commented Sep 10, 2022 •

edited

Loading

jonatanklosko commented Sep 10, 2022

natolambert commented Sep 11, 2022

natolambert commented Sep 11, 2022

sausix commented Sep 11, 2022

patil-suraj commented Sep 12, 2022

jonatanklosko commented Sep 12, 2022

patrickvonplaten commented Sep 12, 2022

IndexError on num_inference_steps values: 3, 9, 27, ...? #444

IndexError on num_inference_steps values: 3, 9, 27, ...? #444

Comments

sausix commented Sep 9, 2022

Describe the bug

Reproduction

Logs

System Info

jonatanklosko commented Sep 9, 2022

jonatanklosko commented Sep 9, 2022

sausix commented Sep 9, 2022

jonatanklosko commented Sep 9, 2022

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022 • edited Loading

jonatanklosko commented Sep 10, 2022

natolambert commented Sep 10, 2022 • edited Loading

natolambert commented Sep 10, 2022

natolambert commented Sep 10, 2022 • edited Loading

jonatanklosko commented Sep 10, 2022 • edited Loading

natolambert commented Sep 10, 2022 • edited Loading

jonatanklosko commented Sep 10, 2022

natolambert commented Sep 11, 2022

natolambert commented Sep 11, 2022

sausix commented Sep 11, 2022

patil-suraj commented Sep 12, 2022

jonatanklosko commented Sep 12, 2022

patrickvonplaten commented Sep 12, 2022

natolambert commented Sep 10, 2022 •

edited

Loading

natolambert commented Sep 10, 2022 •

edited

Loading

natolambert commented Sep 10, 2022 •

edited

Loading

jonatanklosko commented Sep 10, 2022 •

edited

Loading

natolambert commented Sep 10, 2022 •

edited

Loading