-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0-shot long-context summarization / QA inference #19
Comments
Thank you for your question! Here is a brief explanation of how LongLLaMA handles long inputs and why you do not see any fixed context limitation. I would not expect a striking performance from the 3B model. Also, the choice of the |
Thank you. I'm taking a look and will ask follow-up questions. |
As I understood the Colab QA demo (using TextStreamer) should also work in a standard HF pipeline right? Loading modelimport torch
from transformers import LlamaTokenizer, AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_3b_v1_1")
model = AutoModelForCausalLM.from_pretrained("syzymon/long_llama_3b_v1_1",
torch_dtype=torch.float32,
trust_remote_code=True) Input handling and generationLongLLaMA uses the Hugging Face interface, the long input given to the model will be prompt = "My name is Julien and I like to"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model(input_ids=input_ids) During the model call, one can provide the parameter generation_output = model.generate(
input_ids=input_ids,
max_new_tokens=256,
num_beams=1,
last_context_length=1792,
do_sample=True,
temperature=1.0,
)
print(tokenizer.decode(generation_output[0])) So I am doing something like this:
|
btw Can I ask what's a good sampling strategy if you've tried some? I'm doing: model = AutoModelForCausalLM.from_pretrained( And currently, the generation quality is very poor. Thank you very much for the help! |
Hi,
Thank you for this great effort.
I am trying to use your 3B m-instruct-v1_1 model to evaluate on my custom long-context QA dataset with context length up to 200k.
I have a question. I find it difficult to locate keywords like 256k in your colab / .py examples. There are several mentions of 1024 , 2048.. as normal llama has. So this model does support long context right? in which case I should not be using the "drop-in" replacement example.
Thank you very much.
The text was updated successfully, but these errors were encountered: