Inference in Colab fails #13

PrithivirajDamodaran · 2021-08-27T15:38:04Z

Follow the message (added some print statements to debug and removed clear_output) - Please advise

chosen_model: https://www.dropbox.com/s/8mmgnromwoilpfm/16L_64HD_8H_512I_128T_cc12m_cc3m_3E.pt?dl=1 folder_ /content/outputs/Cucumber_on_a_brown_wooden_chair/ Traceback (most recent call last): File "/content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 18, in <module> from dalle_pytorch import DiscreteVAE, OpenAIDiscreteVAE, VQGanVAE, DALLE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/__init__.py", line 1, in <module> from dalle_pytorch.dalle_pytorch import DALLE, CLIP, DiscreteVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/dalle_pytorch.py", line 11, in <module> from dalle_pytorch.vae import OpenAIDiscreteVAE, VQGanVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/vae.py", line 14, in <module> **from taming.models.vqgan import VQModel, GumbelVQ** ImportError: cannot import name 'GumbelVQ' from 'taming.models.vqgan' (/usr/local/lib/python3.7/dist-packages/taming/models/vqgan.py)

The text was updated successfully, but these errors were encountered:

johnpaulbin · 2021-09-12T04:45:19Z

Fixed-- retry.

PrithivirajDamodaran · 2021-09-14T04:31:35Z

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

Vbansal21 · 2021-10-04T05:11:23Z

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

The issue is in the !wget command in 2nd code block(2 Install required dependencies.), line 36 => !wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/, the /content/ at end is a directory, change it to /content/0.14.3.zip. It solves the above issue.

After that there are new issues:

  File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/attention.py", line 362, in forward
    out = self.attn_fn(q, k, v, attn_mask = attn_mask, key_padding_mask = key_pad_mask)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/deepspeed/ops/sparse_attention/sparse_self_attention.py", line 126, in forward
    assert query.dtype == torch.half, "sparse attention only supports training in fp16 currently, please file a github issue if you need fp32 support"
AssertionError: sparse attention only supports training in fp16 currently, please file a github issue if you need fp32 support
Finished generating images, attempting to display results...

And using deepspeed (replacing python to deepspeed and adding the --deepspeed, --fp16 args),
by changing the code block 4 (4 Try out the model.), line 25 to 28 to:

25|if chosen_model not in allow:
26|  !deepspeed /content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py --dalle_path=$checkpoint_path --taming --text="$text" --num_images=$num_images --batch_size=$batch_size --outputs_dir="$_folder" --deepspeed --fp16; wait;
27|else:
28|  !deepspeed /content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py --dalle_path=$checkpoint_path --taming --text="$text" --num_images=$num_images --batch_size=$batch_size --outputs_dir="$_folder" --bpe_path variety.bpe --deepspeed --fp16; wait;

won't help:

generate.py: error: unrecognized arguments: --local_rank=0 --deepspeed --fp16

the problem is that --local_rank=0 arg passed somewhere.

Edit: Update: after solving the local rank issue and fp16 attention issue (rough fix to the generate.py file (added a dummy local_rank parser), and in attention.py manually converting to fp16 and back to orignal(x.dtype)), a new issue arises:
ImportError: cannot import name 'MatMul' from 'deepspeed.ops.sparse_attention' (/usr/local/lib/python3.7/dist-packages/deepspeed/ops/sparse_attention/__init__.py)

PrithivirajDamodaran · 2021-10-10T07:06:07Z

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

@johnpaulbin Sorry not trying to be a pest, please advice on this issue

jonathanfrawley · 2021-10-25T07:57:45Z

Hi, I think this issue is due to this line in the second code cell in the Collab notebook which fails (silently, as for some reason the output is cleared later in the cell!):

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/

It seems you need to specify the full path to the output file for wget, like this:

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/0.14.3.zip

I don't know who has access to that notebook, but if it could be updated that would be great.

Vbansal21 · 2021-10-25T08:25:15Z

Hi, I think this issue is due to this line in the second code cell in the Collab notebook which fails (silently, as for some reason the output is cleared later in the cell!):
!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/
It seems you need to specify the full path to the output file for wget, like this:
!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/0.14.3.zip
I don't know who has access to that notebook, but if it could be updated that would be great.

Sure that will solve this issue, but there are new issue after that. I've mentioned those above.It is a matmul import error of deepspeed.

hamdjalil · 2021-12-12T06:37:48Z

Follow the message (added some print statements to debug and removed clear_output) - Please advise

chosen_model: https://www.dropbox.com/s/8mmgnromwoilpfm/16L_64HD_8H_512I_128T_cc12m_cc3m_3E.pt?dl=1 folder_ /content/outputs/Cucumber_on_a_brown_wooden_chair/ Traceback (most recent call last): File "/content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 18, in <module> from dalle_pytorch import DiscreteVAE, OpenAIDiscreteVAE, VQGanVAE, DALLE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/__init__.py", line 1, in <module> from dalle_pytorch.dalle_pytorch import DALLE, CLIP, DiscreteVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/dalle_pytorch.py", line 11, in <module> from dalle_pytorch.vae import OpenAIDiscreteVAE, VQGanVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/vae.py", line 14, in <module> **from taming.models.vqgan import VQModel, GumbelVQ** ImportError: cannot import name 'GumbelVQ' from 'taming.models.vqgan' (/usr/local/lib/python3.7/dist-packages/taming/models/vqgan.py)

@johnpaulbin @PrithivirajDamodaran can you please share how this issue was resolved? I'm facing on my GPU

Cyberes · 2021-12-25T03:20:36Z

On the DeepSpeed Sparse Attention doc page, there's this:

Note: Currently, DeepSpeed Sparse Attention can be used only on NVIDIA V100 or A100 GPUs using Torch >= 1.6 and CUDA 10.1, 10.2, 11.0, or 11.1.

I have access to a v100 and ran the notebook on it (after adding the fixes) but I encountered the same issue. I ran it on CUDA 11.4 rather than 11.1 so that may be an issue. Colab is on 11.1.

Is is possible to run the notebook on an older version of DeepSpeed?

Stomachache007 mentioned this issue Dec 17, 2021

(colab notebook) Train DALLE-pytorch on C@H lucidrains/DALLE-pytorch#291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference in Colab fails #13

Inference in Colab fails #13

PrithivirajDamodaran commented Aug 27, 2021 •

edited

Loading

johnpaulbin commented Sep 12, 2021

PrithivirajDamodaran commented Sep 14, 2021

Vbansal21 commented Oct 4, 2021 •

edited

Loading

PrithivirajDamodaran commented Oct 10, 2021

jonathanfrawley commented Oct 25, 2021 •

edited

Loading

Vbansal21 commented Oct 25, 2021 •

edited

Loading

hamdjalil commented Dec 12, 2021

Cyberes commented Dec 25, 2021

Inference in Colab fails #13

Inference in Colab fails #13

Comments

PrithivirajDamodaran commented Aug 27, 2021 • edited Loading

johnpaulbin commented Sep 12, 2021

PrithivirajDamodaran commented Sep 14, 2021

Vbansal21 commented Oct 4, 2021 • edited Loading

PrithivirajDamodaran commented Oct 10, 2021

jonathanfrawley commented Oct 25, 2021 • edited Loading

Vbansal21 commented Oct 25, 2021 • edited Loading

hamdjalil commented Dec 12, 2021

Cyberes commented Dec 25, 2021

PrithivirajDamodaran commented Aug 27, 2021 •

edited

Loading

Vbansal21 commented Oct 4, 2021 •

edited

Loading

jonathanfrawley commented Oct 25, 2021 •

edited

Loading

Vbansal21 commented Oct 25, 2021 •

edited

Loading