Questions about converting models and tokenizers downloaded from huggingface #431

zxh0916 · 2023-10-19T06:12:43Z

About model convertion

I have some issues when using huggingface APIs to load models, so I decided to download models and config files from huggingface.co manually (into ~/llama2.c/tiny/).
Then I tried to export the model with python export.py ./tiny/tiny.bin --hf ./tiny/ and the following error occurred:

Traceback (most recent call last):
  File "/home/mrmat/llama2.c/export.py", line 565, in <module>
    model = load_hf_model(args.hf)
  File "/home/mrmat/llama2.c/export.py", line 479, in load_hf_model
    layer.attention.wk.weight = nn.Parameter(permute_reverse(hf_dict[f'model.layers.{i}.self_attn.k_proj.weight']))
  File "/home/mrmat/llama2.c/export.py", line 473, in permute_reverse
    return w.view(n_heads, 2, dim1 // n_heads // 2, dim2).transpose(1, 2).reshape(dim1, dim2)
RuntimeError: shape '[32, 2, 32, 2048]' is invalid for input of size 524288

Then I checked the model config by print(config) and it seems that hf_model = AutoModelForCausalLM.from_pretrained(model_path) is not loading the model hyperparameters correctly:

ModelArgs(dim=2048, n_layers=22, n_heads=32, n_kv_heads=32, vocab_size=32000,
          hidden_dim=5632, multiple_of=256, norm_eps=1e-05, max_seq_len=2048, dropout=0.0)

Since config.json have "num_key_value_heads": 4, I modified the export.py to manually load the hyperparameters from the config.json and change the args of permute_reverse:

if any(['config.json' in path for path in os.listdir(model_path)]):
    with open(os.path.join(model_path, 'config.json'), 'r') as f:
        config_json = json.load(f)
    config.dim = config_json["hidden_size"]
    config.n_layers = config_json["num_hidden_layers"]
    config.n_heads = config_json["num_attention_heads"]
    config.n_kv_heads = config_json["num_key_value_heads"]
    config.vocab_size = config_json["vocab_size"]
    config.hidden_dim = config_json["intermediate_size"]
    config.norm_eps = config_json["rms_norm_eps"]
    config.max_seq_len = config_json["max_position_embeddings"]
else:
    config.dim = hf_model.config.hidden_size
    config.n_layers = hf_model.config.num_hidden_layers
    config.n_heads = hf_model.config.num_attention_heads
    config.n_kv_heads = hf_model.config.num_attention_heads
    config.vocab_size = hf_model.config.vocab_size
    config.hidden_dim = hf_model.config.intermediate_size
    config.norm_eps = hf_model.config.rms_norm_eps
    config.max_seq_len = hf_model.config.max_position_embeddings

layer.attention.wk.weight = nn.Parameter(permute_reverse(
    hf_dict[f'model.layers.{i}.self_attn.k_proj.weight'],
    n_heads=config.n_kv_heads,
    dim1=config.dim//config.n_heads*config.n_kv_heads))

and it finally works.

About custom tokenizer

Then I tried another LLaMA2 model created by the same contributer. While the model can be exported correctly, ./run failed to load the tokenizer due to different vocab size. It turns out that there are three additional tokens in their vocabulary compared to the default one used by LLaMA2. Some LLaMA2 models in huggingface are trained even with GPT-2 tokenizer.
I looked at the tokenizer.py and it only supports to generate tokenizer.bin with tokenizer.model generated by SentencePiece library.
So my question is: How can I convert a custom tokenizer (not created by SentensePiece, either downloaded manually or saved by tokenizer.save_pretrained(path)) into tokenizer.bin since the function encode in run.c need to use the vocab_scores which custom tokenizers don't have?
I am new to LLaMA2 and this project, thank you so much for help!

The text was updated successfully, but these errors were encountered:

See: karpathy/llama2.c#431

umama-rahman1 · 2025-02-11T20:42:07Z

Are you also having a failed read error when doing a run?

clebert added a commit to clebert/llama2.zig that referenced this issue Oct 19, 2023

Support converting GQA HF models like TinyLlama-1.1B

fa0e8a5

See: karpathy/llama2.c#431

umama-rahman1 mentioned this issue Feb 4, 2025

Running TinyLlama1.1B parameters on llama2.c #549

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about converting models and tokenizers downloaded from huggingface #431

Questions about converting models and tokenizers downloaded from huggingface #431

zxh0916 commented Oct 19, 2023 •

edited

Loading

umama-rahman1 commented Feb 11, 2025

Questions about converting models and tokenizers downloaded from huggingface #431

Questions about converting models and tokenizers downloaded from huggingface #431

Comments

zxh0916 commented Oct 19, 2023 • edited Loading

About model convertion

About custom tokenizer

umama-rahman1 commented Feb 11, 2025

zxh0916 commented Oct 19, 2023 •

edited

Loading