You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model works well when I run inference using the nemo checkpoint (script). But the converted checkpoint (script) drastically drops in performance. Any ideas why this might be happening? My only hunch is that apply_query_key_layer_scaling=True in nemo, which might not be the case in HF.
Conversion test results:
Logits match: False (max diff: 4.91e+00)
Parameters match: True (max diff: 0.00e+00)
Generation match: 0.0
Sample generation comparison:
Input text: '<s>[INST] Hello [/INST]\n'
NeMo output: "<s>[INST] Hello [/INST]\n Hello. It's nice to meet you. Is there something I can help you with or"
HF output: '<s> [INST] Hello [/INST]\n Hello. ನಿಮ್ಮನ್ನ ಭೇಟಿ ಮಾಡಿ ಸಂತೋಷ ಆಯ್ತು. ನಿಮಗೆ ಏನ'
Number of parameters match: 1.0 (Nemo: 2525087744, HF: 2525087744)
❌ Conversion test failed!
I am not able to pinpoint why this is happening. Any pointers will be helpful.
Describe the bug
I have trained a llama-like model with nemo using the below model config:
The model works well when I run inference using the nemo checkpoint (script). But the converted checkpoint (script) drastically drops in performance. Any ideas why this might be happening? My only hunch is that
apply_query_key_layer_scaling=True
in nemo, which might not be the case in HF.Environment details
https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html#nemo-framework-24-05
The text was updated successfully, but these errors were encountered: