Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: cannot import name 'LlamaTokenizer' from 'transformers.models.llama' #17

Closed
AllanOricil opened this issue Apr 29, 2024 · 7 comments

Comments

@AllanOricil
Copy link

AllanOricil commented Apr 29, 2024

I tried the minimum example from https://huggingface.co/Snowflake/snowflake-arctic-instruct and it did not work. Can you help me to fix it?

image

Im using the latest trasnformers release commit.

image

snowflake-arctic-instruct.py

import os
# enable hf_transfer for faster ckpt download
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed.linear.config import QuantizationConfig

tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True
)
quant_config = QuantizationConfig(q_bits=8)

model = AutoModelForCausalLM.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="auto",
    ds_quantization_config=quant_config,
    max_memory={i: "150GiB" for i in range(8)},
    torch_dtype=torch.bfloat16)


content = "5x + 35 = 7x - 60 + 10. Solve for x"
messages = [{"role": "user", "content": content}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

requirements.txt

annotated-types==0.6.0
certifi==2024.2.2
charset-normalizer==3.3.2
deepspeed==0.14.2
filelock==3.13.4
fsspec==2024.3.1
hf_transfer==0.1.6
hjson==3.1.0
huggingface-hub==0.22.2
idna==3.7
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
packaging==24.0
psutil==5.9.8
py-cpuinfo==9.0.0
pydantic==2.7.1
pydantic_core==2.18.2
pynvml==11.5.0
PyYAML==6.0.1
regex==2024.4.28
requests==2.31.0
safetensors==0.4.3
sympy==1.12
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.2
transformers @ git+https://github.com/huggingface/transformers@9fe3f585bb4ea29f209dc705d269fbe292e1128f
typing_extensions==4.11.0
urllib3==2.2.1
@karthik-nexusflow
Copy link

error can be fixed by installing the correct snowflake transformers pip install git+https://github.com/Snowflake-Labs/transformers.git@arctic

and then also install , as the llama's conditional causes this error

pip install sentencepiece
pip install tokenizers

@AllanOricil
Copy link
Author

AllanOricil commented May 1, 2024

@karthik-nexusflow better to add this information here because beginners like me may try to install the official hugging face transformers package instead of the fork one, which will lead to this issue

image

@jeffra
Copy link
Collaborator

jeffra commented May 2, 2024

@AllanOricil, with trust_remote_code=True this should be working with public/official transformers>=4.39.0. I am testing this now in a fresh/clean environment with the version you list git+https://github.com/huggingface/transformers@9fe3f585bb4ea29f209dc705d269fbe292e1128f and I'm not able to reproduce this error for some reason :(

When did you download the weights? If you are running in an offline mode and downloaded them more than 5 days ago then trust_remote_code=True won't work and might produce this issue? This PR is what should be getting around installing the transformers fork: https://huggingface.co/Snowflake/snowflake-arctic-instruct/commit/f4ca7904b66a80b6f62d6272253ea1e32375ddd6

The core issue is confusing me though, it's saying you can't import the LlamaTokenizer but this should be available in transformers for a while now well before Arctic was introduced.

@AllanOricil
Copy link
Author

AllanOricil commented May 2, 2024

@jeffra I don't even know where to use that trust variable. Is that when I run python3 script.py ? I really have no experience with python so pardon me for noob questions 😅

I just copied the simple example, created a virtual env, installed transformers 4.39.0 and deepspeed 0.14.2, then I tried to run the script with python 3, and it did not work. Got the same error that led me to open this issue.

Then I decided to go to hugging face transformers repo to get the latest release of their package, updated my virtual env with it, tried to run the code again, and again the same issue happened. Then I opened this issue here.

To get the list of dependencies I ran a command called freeze.

I have also not download any weights. Isn't that suppose to happen automatically when I ran that example code?

@AllanOricil
Copy link
Author

Another question. Can I run this on a M2 Max with 32Gb ram in AWS? This was my plan 😀

@sfc-gh-jrasley
Copy link
Collaborator

We had another user run into this same issue wrt LlamaTokenizer. It appears there's a dependency on sentencepiece for this tokenizer. I've updated the requirements.txt file for inference to #25 to address this going forward.

@AllanOricil wrt to M2 Max, it's not on our exact roadmap but i believe this support was recently added in llama.cpp! :) ggerganov/llama.cpp#7020

Closing issue for now as i think the main issue is now resolved.

@AllanOricil
Copy link
Author

I will give it another chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants