Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

Closed
2 of 4 tasks
kechan opened this issue Mar 31, 2023 · 10 comments

Comments

@kechan
Copy link

kechan commented Mar 31, 2023

System Info

  • transformers version: 4.28.0.dev0
  • Platform: macOS-13.2.1-arm64-arm-64bit
  • Python version: 3.9.6
  • Huggingface_hub version: 0.13.3
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.0.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes, I use device='mps'
  • Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To reproduce, just run this on a M1/M2 Mac with Apple silicon

from transformers import LlamaForCausalLM, LlamaTokenizer
import torch

tokenizer = LlamaTokenizer.from_pretrained('/path/to/weights')
model = LlamaForCausalLM.from_pretrained('/path/to/weights')

device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
print(f'Using device: {device}')
model = model.to(device)

prompt = "Hey, are you consciours? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}   # place on device 

input_ids = inputs['input_ids'].to(torch.int32)  # doesn't appear to help
attn_masks = inputs['attention_mask'].to(torch.int32)  # doesn't appear to help

generate_ids = model.generate(input_ids, max_length=30)

Expected behavior

No error. Will post stack trace.

@kechan
Copy link
Author

kechan commented Mar 31, 2023

Relevant stack trace (can provide more if needed):

File ~/Developer/python39_env/lib/python3.9/site-packages/transformers/generation/utils.py:2245, in GenerationMixin.greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2242 break
2244 # prepare model inputs
-> 2245 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2247 # forward pass to get next token
2248 outputs = self(
2249 **model_inputs,
2250 return_dict=True,
2251 output_attentions=output_attentions,
2252 output_hidden_states=output_hidden_states,
2253 )

File ~/Developer/python39_env/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:736, in LlamaForCausalLM.prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
733 position_ids = kwargs.get("position_ids", None)
734 if attention_mask is not None and position_ids is None:
735 # create position_ids on the fly for batch generation
--> 736 position_ids = attention_mask.long().cumsum(-1) - 1
737 position_ids.masked_fill_(attention_mask == 0, 1)
738 if past_key_values:

RuntimeError: MPS does not support cumsum op with int64 input

This seems to happen during greedy search and subsequently precisely at:

position_ids = attention_mask.long().cumsum(-1) - 1

@kechan
Copy link
Author

kechan commented Mar 31, 2023

Actually, this could be PyTorch/MPS issue, that the int64 version of cumsum is not implemented. Found the issue there:
pytorch/pytorch#96610

I wonder if long is necessary for attention_mask? should int32 be good enough?

@sgugger
Copy link
Collaborator

sgugger commented Mar 31, 2023

According to the issue it should be fixed with a nightly install of PyTorch and MacOS 13.3

@kechan
Copy link
Author

kechan commented Mar 31, 2023

@sgugger thanks for responding. I just updated to 13.3 and the torch nightly, and indeed, no more problem. Closing issue.

@kechan kechan closed this as completed Mar 31, 2023
@kechan
Copy link
Author

kechan commented Mar 31, 2023

just for fun, increase length to 256

my prompt is "Is facebook a bad company?"

" Is facebook a bad company?\nI'm not sure if this is the right place to post this, but I'm not sure where else to post it.\nI'm not a facebook user, but I've heard a lot of bad things about it. I've heard that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website"

it started repeating things. Maybe this is 7B, and it would behave better for larger one?

This must have not been an encouraging sign for earlier pioneers. So it is amazing openAi stuck at it and arrived all the way to chatGPT level of great.

@cfmbrand
Copy link

This is a problem for me now - running 13.5.2 MacOS, python 3.10.9. Cannot find a solution to this other than workarounds that I can't understand. Any advice on how to get past this? Must be a problem for a lot of people? Thanks in advance.

@moradisina
Copy link

I have the same issue (RuntimeError: MPS does not support cumsum op with int64 input) with my MacOS Version 14.0 and nightly torch. Any idea how I can solve this issue?

@Sunjung-Dev
Copy link

I have same issue, anyone can help me ?

@itoof-com
Copy link

m1 macOS 14.1.1 (23B81), also has this problem

@petergreis
Copy link

Running against ChatMusician, which has been trained from Llama 2 7b I see the same thing. Solved with:

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Relevant from the installation:

Collecting torch
  Downloading https://download.pytorch.org/whl/nightly/cpu/torch-2.4.0.dev20240420-cp311-none-macosx_11_0_arm64.whl (61.7 MB)

(testml) petergreis@MacBook-Pro-M1-Max-2021 ChatMusician % pip list | grep torch
torch                     2.4.0.dev20240420
torchaudio                2.2.0.dev20240420
torchvision               0.19.0.dev20240420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants