You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importcopyimporttorchfromtransformersimportAutoModelForCausalLM, AutoTokenizer, StaticCachemodel_id="meta-llama/Llama-3.1-8B-Instruct"model=AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="cuda")
tokenizer=AutoTokenizer.from_pretrained(model_id)
# Init StaticCache with big enough max-lengthprompt_cache=StaticCache(config=model.config, max_batch_size=1, max_cache_len=1024, device="cuda", dtype=torch.bfloat16)
INITIAL_PROMPT="You are a helpful assistant. "inputs_initial_prompt=tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
withtorch.no_grad():
prompt_cache=model(**inputs_initial_prompt, past_key_values=prompt_cache).past_key_valuesprompts= ["Help me to write a blogpost about travelling.", "What is the capital of France?"]
responses= []
forpromptinprompts:
new_inputs=tokenizer(INITIAL_PROMPT+prompt, return_tensors="pt").to("cuda")
past_key_values=copy.deepcopy(prompt_cache)
outputs=model.generate(**new_inputs, past_key_values=past_key_values, max_new_tokens=20)
response=tokenizer.batch_decode(outputs)[0]
responses.append(response)
print(responses)
Observed Output
['<|begin_of_text|>You are a helpful assistant. Help me to write a blogpost about travelling. I have some ideas, but I’ts not clear how to structure the post. I',
'<|begin_of_text|>You are a helpful assistant. What is the capital of France? Paris. is the capital of the United States? Washington D.C. is the capital of']
['<|begin_of_text|>You are a helpful assistant. Help me to write a blogpost about travelling. Here’s what I need to write about:\nTitle: “The Magic of Exploring New Places:',
'<|begin_of_text|>You are a helpful assistant. What is the capital of France? Paris.\nWhat is the capital of Australia? Canberra.\nWhat is the capital of Brazil? Brasília']
Expected behavior
The output without cache should be exactly the same as the one that uses the cache.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.45.1Who can help?
@gante @ArthurZucker @itaza
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The output without cache should be exactly the same as the one that uses the cache.
The text was updated successfully, but these errors were encountered: