Skip to content

Commit

Permalink
Fix bug in tools/ckpts/convert_neox_to_hf.py for setting intermediate…
Browse files Browse the repository at this point in the history
…_size (#1209)

In tools/ckpts/convert_neox_to_hf.py, for neox architecture the 'intermediate_size'
argument is not explicitly set, so it defaults to 24576 from:

https://github.com/huggingface/transformers/blob/9fe3f585bb4ea29f209dc705d269fbe292e1128f/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L48

Proposed solution: set intermediate-size to 4 * hidden-size
  • Loading branch information
jvendrow authored May 4, 2024
1 parent 916c883 commit c814959
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions tools/ckpts/convert_neox_to_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,11 @@ def __init__(self, neox_config):
),
"use_parallel_residual": get_key(neox_config, "gpt-j-residual", False),
"layer_norm_eps": get_key(neox_config, "layernorm-epsilon", 1e-5),
"intermediate_size": get_key(
neox_config,
"intermediate-size",
4 * get_key(neox_config, "hidden-size"),
),
}
)
hf_config = GPTNeoXConfig(**args)
Expand Down

0 comments on commit c814959

Please sign in to comment.