Cpp runner outputs wrong results when using lora + tensor parallelism #2634
Labels
bug
Something isn't working
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
System Info
x86_64, debian 11, A100 GPUs
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
On a VM with 2 A100 GPUs:
Expected behavior
Python runner and cpp runner should give same results
actual behavior
Python runner and cpp runner give totally different results, and the results from cpp runners are apparently wrong
additional notes
N/A
The text was updated successfully, but these errors were encountered: