-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gptManagerBenchmark seems to go into a dead loop with GPU usage 0% #1562
Comments
@byshiue @juney-nvidia Anyone give some comments? |
You can set correctly parameters and try again.
|
4 tasks
Merged
Hi @sleepwalker2017 , can you please help check if the issue has been fixed on the latest main branch? Thanks. |
Hi @sleepwalker2017 do u still have further issue or question now? If not, we'll close it soon. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I run this on GPU: 2 * A30 with CUDA driver 535.104.12.
The docker image is built using
make -C docker release_build CUDA_ARCHS="80-real"
I use the latest code in branch main.
The GPU usage is 0% and CPU keeps 100% for a long time.
Stack trace for both processes:
It should be noted that it has finished the inference for non-lora requests, and get stuck when doing lora benchmark.
I post the scripts for reproducing this at the end of this issue.
Some additional questions about this script:
--lora_target_modules
, I see the manual only givesattn_qkv
, why is that? and what is the meaning forattn_qkv
?--lora_num_device_mod_layers $(( 32 * $NUM_LAYERS * $NUM_LORA_MODS * $MAX_LORA_RANK ))
, what is the meaning for 32? Is it lora_num ?attn_q attn_k attn_v attn_dense mlp_h_to_4h mlp_gate mlp_4h_to_h
, So what isattn_qkv
for?The text was updated successfully, but these errors were encountered: