You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ python ./tools/ckpts/convert_neox_to_hf.py --input_dir checkpoints/pythia-70M/global_step143000/ --config_file pythia-70m.yml --output_dir hf_model/pythia-70M --precision fp16 --architecture neox
[2024-05-03 11:17:41,262] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Detected 'pipe-parallel-size' of 1, assuming model is saved as PipelineModule...
> building HFTokenizer tokenizer ...
> padded vocab (size: 50277) with 27 dummy tokens (new size: 50304)
0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "./tools/ckpts/convert_neox_to_hf.py", line 732, in <module>
main()
File "./tools/ckpts/convert_neox_to_hf.py", line 696, in main
hf_model = convert(
File "./tools/ckpts/convert_neox_to_hf.py", line 555, in convert
hf_layer.load_state_dict(state_dict)
File "/mnt/xfs/home/jvendrow/conda_envs/pythia/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPTNeoXLayer:
size mismatch for mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is t
orch.Size([24576, 512]).
size mismatch for mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Si
ze([24576]).
size mismatch for mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is t
orch.Size([512, 24576]).
Proposed solution
It seems the intermediate size for neox architecture in general is 4 * hidden size. Suggested edit is to add the following for neox models:
Description
When converting neox models to HF format, the 'intermediate_size' argument in the GPTNeoXConfig is not explicitly set, so it defaults to 24576 as per:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/configuration_gpt_neox.py
To Reproduce
Steps to reproduce the behavior:
Proposed solution
It seems the intermediate size for neox architecture in general is 4 * hidden size. Suggested edit is to add the following for neox models:
Happy to make a PR.
The text was updated successfully, but these errors were encountered: