Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

kechan · 2023-03-31T19:20:27Z

System Info

transformers version: 4.28.0.dev0
Platform: macOS-13.2.1-arm64-arm-64bit
Python version: 3.9.6
Huggingface_hub version: 0.13.3
Safetensors version: not installed
PyTorch version (GPU?): 2.0.0 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes, I use device='mps'
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

To reproduce, just run this on a M1/M2 Mac with Apple silicon

from transformers import LlamaForCausalLM, LlamaTokenizer
import torch

tokenizer = LlamaTokenizer.from_pretrained('/path/to/weights')
model = LlamaForCausalLM.from_pretrained('/path/to/weights')

device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
print(f'Using device: {device}')
model = model.to(device)

prompt = "Hey, are you consciours? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}   # place on device 

input_ids = inputs['input_ids'].to(torch.int32)  # doesn't appear to help
attn_masks = inputs['attention_mask'].to(torch.int32)  # doesn't appear to help

generate_ids = model.generate(input_ids, max_length=30)

Expected behavior

No error. Will post stack trace.

The text was updated successfully, but these errors were encountered:

kechan · 2023-03-31T19:23:22Z

Relevant stack trace (can provide more if needed):

File ~/Developer/python39_env/lib/python3.9/site-packages/transformers/generation/utils.py:2245, in GenerationMixin.greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2242 break
2244 # prepare model inputs
-> 2245 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2247 # forward pass to get next token
2248 outputs = self(
2249 **model_inputs,
2250 return_dict=True,
2251 output_attentions=output_attentions,
2252 output_hidden_states=output_hidden_states,
2253 )

File ~/Developer/python39_env/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:736, in LlamaForCausalLM.prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
733 position_ids = kwargs.get("position_ids", None)
734 if attention_mask is not None and position_ids is None:
735 # create position_ids on the fly for batch generation
--> 736 position_ids = attention_mask.long().cumsum(-1) - 1
737 position_ids.masked_fill_(attention_mask == 0, 1)
738 if past_key_values:

RuntimeError: MPS does not support cumsum op with int64 input

This seems to happen during greedy search and subsequently precisely at:

position_ids = attention_mask.long().cumsum(-1) - 1

kechan · 2023-03-31T19:26:14Z

Actually, this could be PyTorch/MPS issue, that the int64 version of cumsum is not implemented. Found the issue there:
pytorch/pytorch#96610

I wonder if long is necessary for attention_mask? should int32 be good enough?

sgugger · 2023-03-31T19:40:29Z

According to the issue it should be fixed with a nightly install of PyTorch and MacOS 13.3

kechan · 2023-03-31T21:06:09Z

@sgugger thanks for responding. I just updated to 13.3 and the torch nightly, and indeed, no more problem. Closing issue.

kechan · 2023-03-31T21:12:21Z

just for fun, increase length to 256

my prompt is "Is facebook a bad company?"

" Is facebook a bad company?\nI'm not sure if this is the right place to post this, but I'm not sure where else to post it.\nI'm not a facebook user, but I've heard a lot of bad things about it. I've heard that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website, that it's a bad social network, that it's a bad company, that it's a bad product, that it's a bad service, that it's a bad website"

it started repeating things. Maybe this is 7B, and it would behave better for larger one?

This must have not been an encouraging sign for earlier pioneers. So it is amazing openAi stuck at it and arrived all the way to chatGPT level of great.

cfmbrand · 2023-09-10T22:03:59Z

This is a problem for me now - running 13.5.2 MacOS, python 3.10.9. Cannot find a solution to this other than workarounds that I can't understand. Any advice on how to get past this? Must be a problem for a lot of people? Thanks in advance.

moradisina · 2023-09-27T05:47:32Z

I have the same issue (RuntimeError: MPS does not support cumsum op with int64 input) with my MacOS Version 14.0 and nightly torch. Any idea how I can solve this issue?

Sunjung-Dev · 2023-11-13T15:15:52Z

I have same issue, anyone can help me ?

itoof-com · 2023-11-18T07:25:47Z

m1 macOS 14.1.1 (23B81), also has this problem

petergreis · 2024-04-21T11:32:34Z

Running against ChatMusician, which has been trained from Llama 2 7b I see the same thing. Solved with:

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Relevant from the installation:

Collecting torch
  Downloading https://download.pytorch.org/whl/nightly/cpu/torch-2.4.0.dev20240420-cp311-none-macosx_11_0_arm64.whl (61.7 MB)

(testml) petergreis@MacBook-Pro-M1-Max-2021 ChatMusician % pip list | grep torch
torch                     2.4.0.dev20240420
torchaudio                2.2.0.dev20240420
torchvision               0.19.0.dev20240420

kechan closed this as completed Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

kechan commented Mar 31, 2023 •

edited

Loading

kechan commented Mar 31, 2023

kechan commented Mar 31, 2023 •

edited

Loading

sgugger commented Mar 31, 2023

kechan commented Mar 31, 2023

kechan commented Mar 31, 2023

cfmbrand commented Sep 10, 2023

moradisina commented Sep 27, 2023

Sunjung-Dev commented Nov 13, 2023

itoof-com commented Nov 18, 2023

petergreis commented Apr 21, 2024

Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

Running LlamaForCausalLM with MPS provokes "RuntimeError: MPS does not support cumsum op with int64 input" #22502

Comments

kechan commented Mar 31, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

kechan commented Mar 31, 2023

kechan commented Mar 31, 2023 • edited Loading

sgugger commented Mar 31, 2023

kechan commented Mar 31, 2023

kechan commented Mar 31, 2023

cfmbrand commented Sep 10, 2023

moradisina commented Sep 27, 2023

Sunjung-Dev commented Nov 13, 2023

itoof-com commented Nov 18, 2023

petergreis commented Apr 21, 2024

kechan commented Mar 31, 2023 •

edited

Loading

kechan commented Mar 31, 2023 •

edited

Loading