-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Completion and Code Infilling - LLama 7B Inference #256
Comments
I ran into this issue too. It's supposed to be fixed in the latest update of Transformers: huggingface/optimum#1446 |
thanks @jxmorris12 , @akaneshiro7 seems the fix in optimum should help. Pls let me us know if that works for you. |
Can confirm it works after installing from source. |
Yes, installing from source seems to have fixed the issue with running Code Completion and Code Infilling with the HuggingFace Accelerate Model. Am I able to run it with a PEFT Fine-Tuned Model? When I try to run the same command but passing in the peft_model, I get the following error: File "/home/ubuntu/anaconda3/envs/vllm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1685, in delattr |
I meet the issue "AttributeError: _hf_hook" too. |
It went away when I set 'use_fast_kernels=False'
|
I meet the issue "AttributeError: _hf_hook" too. Could you please fix this? We really want to speed up the inference stage. (With LORA model). Thanks! |
Hello,
I am trying to run inference for Code LLama using HuggingFace Transformers Accelerate Model for llama 2 7B. I am able to run inference with the examples/inference.py script. However, when I try to run the example command for code completion and code infilling, I get the following error:
TypeError: llama_forward() got an unexpected keyword argument 'padding_mask'
Warning: The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
Command: python examples/code_llama/code_infilling_example.py --model_name ../hf_transformers/7B/ --prompt_file examples/code_llama/code_infilling_prompt.txt --temperature 0.2 --top_p 0.9
Versions:
torch==2.0.1
transformers==4.34.1
tokenizers==0.14.1
optimum==1.13.2
The text was updated successfully, but these errors were encountered: