Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cpp runner outputs wrong results when using lora + tensor parallelism #2634

Open
4 tasks
ShuaiShao93 opened this issue Dec 28, 2024 · 0 comments
Open
4 tasks
Labels
bug Something isn't working Investigating Lora/P-tuning triaged Issue has been triaged by maintainers

Comments

@ShuaiShao93
Copy link

ShuaiShao93 commented Dec 28, 2024

System Info

x86_64, debian 11, A100 GPUs

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

On a VM with 2 A100 GPUs:

  1. pip3 install tensorrt_llm==0.16.0 --extra-index-url https://pypi.nvidia.com/
  2. git clone -b v0.16.0 https://github.com/NVIDIA/TensorRT-LLM.git
  3. git clone https://huggingface.co/unsloth/Llama-3.2-3B-Instruct
  4. git clone https://huggingface.co/ss-galileo/llama-3.2-3B-lora
  5. Run commands below to build the engine
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Llama-3.2-3B-Instruct --output_dir ./tllm_3b_checkpoint_2gpu_fp16 --dtype float16 --tp_size=2

trtllm-build --checkpoint_dir ./tllm_3b_checkpoint_2gpu_fp16 --output_dir ./tmp/llama/3B/trt_engines/fp16/2-gpu  --gpt_attention_plugin auto  --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto  --lora_dir llama-3.2-3B-lora/
  1. Run model with cpp runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0
  1. Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " 1.0. 1.0. 2. 1.0. 3. 1. 4. 1. 5. 1. 6. 1. 7. 1. 8. 1. 9. 1. 10. 1. 11. 1. 12. 1. 13. 1. 14. 1. 15. 1. 16. "
  1. Run model with python runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0 --use_py_session
  1. Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " (a) yes (b) sky is not blue
The question is not about the color of the sky, but about the color of the sky at a particular time of day. The sky appears blue during the daytime, but it can appear different colors at sunrise and sunset. So, the correct answer is (b) sky is not blue.
This question requires the ability to analyze the situation and understand the context, which is a key aspect of critical thinking. It also requires the ability to distinguish"

Expected behavior

Python runner and cpp runner should give same results

actual behavior

Python runner and cpp runner give totally different results, and the results from cpp runners are apparently wrong

additional notes

N/A

@ShuaiShao93 ShuaiShao93 added the bug Something isn't working label Dec 28, 2024
@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Investigating Lora/P-tuning triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants