Code Completion and Code Infilling - LLama 7B Inference #256

akaneshiro7 · 2023-10-19T17:30:16Z

Hello,

I am trying to run inference for Code LLama using HuggingFace Transformers Accelerate Model for llama 2 7B. I am able to run inference with the examples/inference.py script. However, when I try to run the example command for code completion and code infilling, I get the following error:

TypeError: llama_forward() got an unexpected keyword argument 'padding_mask'

Warning: The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.

Command: python examples/code_llama/code_infilling_example.py --model_name ../hf_transformers/7B/ --prompt_file examples/code_llama/code_infilling_prompt.txt --temperature 0.2 --top_p 0.9

Versions:
torch==2.0.1
transformers==4.34.1
tokenizers==0.14.1
optimum==1.13.2

jxmorris12 · 2023-10-21T22:18:57Z

I ran into this issue too. It's supposed to be fixed in the latest update of Transformers: huggingface/optimum#1446

HamidShojanazeri · 2023-10-22T19:20:44Z

thanks @jxmorris12 , @akaneshiro7 seems the fix in optimum should help. Pls let me us know if that works for you.

eduardosanchezg · 2023-10-23T17:46:52Z

Can confirm it works after installing from source.

akaneshiro7 · 2023-10-23T18:36:14Z

@HamidShojanazeri

Yes, installing from source seems to have fixed the issue with running Code Completion and Code Infilling with the HuggingFace Accelerate Model. Am I able to run it with a PEFT Fine-Tuned Model? When I try to run the same command but passing in the peft_model, I get the following error:

File "/home/ubuntu/anaconda3/envs/vllm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1685, in delattr
super().delattr(name)
AttributeError: _hf_hook

huangyangyu · 2023-10-24T06:55:37Z

I meet the issue "AttributeError: _hf_hook" too.

callumHub · 2023-11-29T00:36:03Z

It went away when I set 'use_fast_kernels=False'

I meet the issue "AttributeError: _hf_hook" too.

XvHaidong · 2023-12-08T14:46:39Z

I meet the issue "AttributeError: _hf_hook" too. Could you please fix this? We really want to speed up the inference stage. (With LORA model). Thanks!

HamidShojanazeri added the triaged label Oct 22, 2023

init27 closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Completion and Code Infilling - LLama 7B Inference #256

Code Completion and Code Infilling - LLama 7B Inference #256

akaneshiro7 commented Oct 19, 2023 •

edited

Loading

jxmorris12 commented Oct 21, 2023

HamidShojanazeri commented Oct 22, 2023

eduardosanchezg commented Oct 23, 2023

akaneshiro7 commented Oct 23, 2023

huangyangyu commented Oct 24, 2023

callumHub commented Nov 29, 2023

XvHaidong commented Dec 8, 2023

Code Completion and Code Infilling - LLama 7B Inference #256

Code Completion and Code Infilling - LLama 7B Inference #256

Comments

akaneshiro7 commented Oct 19, 2023 • edited Loading

jxmorris12 commented Oct 21, 2023

HamidShojanazeri commented Oct 22, 2023

eduardosanchezg commented Oct 23, 2023

akaneshiro7 commented Oct 23, 2023

huangyangyu commented Oct 24, 2023

callumHub commented Nov 29, 2023

XvHaidong commented Dec 8, 2023

akaneshiro7 commented Oct 19, 2023 •

edited

Loading