How to use on GPU? #10

phiweger · 2023-07-15T17:58:06Z

Very interesting library @r2d4 !

I am trying to use the example in the README but with the model being on the GPU (as is required for many of the recent larger LLMs):

import regex
from transformers import AutoModelForCausalLM, AutoTokenizer

from rellm import complete_re

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

prompt = "ReLLM, the best way to get structured data out of LLMs, is an acronym for "
pattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')

# THIS IS WHAT I'D LIKE TO DO
devide = "cuda:0"
model.to(device)

output = complete_re(tokenizer=tokenizer, 
                     model=model, 
                     prompt=prompt,
                     pattern=pattern,
                     do_sample=True,
                     max_new_tokens=80)
print(output)

fails with

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Is it possible to use ReLLM with the model living on the GPU?

The text was updated successfully, but these errors were encountered:

phiweger · 2023-07-15T18:09:07Z

related to #6 I guess

Emekaborisama · 2023-12-21T01:17:03Z

i am having this same issue. any help pls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use on GPU? #10

How to use on GPU? #10

phiweger commented Jul 15, 2023

phiweger commented Jul 15, 2023

Emekaborisama commented Dec 21, 2023

How to use on GPU? #10

How to use on GPU? #10

Comments

phiweger commented Jul 15, 2023

phiweger commented Jul 15, 2023

Emekaborisama commented Dec 21, 2023