Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected dim to be between 0 and 2 but got -1 #2

Closed
dbl001 opened this issue Jan 16, 2023 · 4 comments
Closed

RuntimeError: Expected dim to be between 0 and 2 but got -1 #2

dbl001 opened this issue Jan 16, 2023 · 4 comments

Comments

@dbl001
Copy link

dbl001 commented Jan 16, 2023

I am trying to run FSB on PyTorch Version: 2.0.0a0+gitf8b2879 on MacOS Ventura 13.1 using the 'MPS' backend (not Cuda).
I'm getting an exception here:

position_ids = attention_mask.long().cumsum(-1) - 1

Torch

% pip show torch
Name: torch
Version: 2.0.0a0+gitf8b2879

Here's is my command:

% python main_response_generation.py --model_checkpoint EleutherAI/gpt-j-6B --dataset persona --gpu 0

Here's the stack trace:

/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/python /Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 54481 --file /Users/davidlaxer/FSB/main_response_generation.py --model_checkpoint EleutherAI/gpt-j-6B --dataset persona --gpu 0 
Connected to pydev debugger (build 223.8214.51)
LOADING EleutherAI/gpt-j-6B
DONE LOADING
EVALUATING DATASET persona on EleutherAI/gpt-j-6B with beam size 1
Loaded persona dict_keys([0, 1, 5]) shots for shuffle 0!
Loaded persona dict_keys([0, 1, 5]) shots for shuffle 1!
  0%|          | 0/1000 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
  0%|          | 0/1000 [01:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/davidlaxer/FSB/prompts/generic_prompt.py", line 248, in get_response
    output = model.generate(
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/transformers/generation/utils.py", line 1352, in generate
    return self.greedy_search(
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/transformers/generation/utils.py", line 2122, in greedy_search
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 776, in prepare_inputs_for_generation
    position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: Expected dim to be between 0 and 2 but got -1
python-BaseException

The attention mask is:

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1]], device='mps:0')

Screenshot 2023-01-16 at 2 47 28 PM

Correct me if I'm mistaken, but isn't cumsum(-1) for a numpy array (not a tensor)?

% ipython
Python 3.10.8 (main, Nov 24 2022, 08:09:04) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.7.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: >>> a = torch.randn(10)
   ...: >>> a
   ...: >>> torch.cumsum(a, dim=0)
   ...: 
Out[2]: 
tensor([-0.1626,  1.6762,  2.0590,  3.7352,  4.5073,  4.9953,  5.2350,  5.1355,
         3.4314,  2.6513])

In [3]: b = a.cpu().detach().numpy()

In [4]: b.cumsum(-1) -1
Out[4]: 
array([-1.162638 ,  0.6762279,  1.0589981,  2.7352297,  3.5072613,
        3.9952703,  4.2349663,  4.135498 ,  2.4313903,  1.6513085],
      dtype=float32)

In [5]: 

@andreamad8
Copy link
Owner

Hi @dbl001,

thanks for reaching out.

I think this comes from batch decoding. I got that code from:
huggingface/transformers#21080

For cumsum, in theory should be a pytorch tensor.

I don't have with me an M processor, so I cannot try myself. QQ: could you try with another pytorch version?

-- Andrea

@dbl001
Copy link
Author

dbl001 commented Jan 17, 2023 via email

@dbl001
Copy link
Author

dbl001 commented Jan 17, 2023 via email

@andreamad8
Copy link
Owner

let me know if you can figure it out. I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants