Cpp runner outputs wrong results when using lora + tensor parallelism #2634

ShuaiShao93 · 2024-12-28T00:04:28Z

System Info

x86_64, debian 11, A100 GPUs

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

On a VM with 2 A100 GPUs:

pip3 install tensorrt_llm==0.16.0 --extra-index-url https://pypi.nvidia.com/
git clone -b v0.16.0 https://github.com/NVIDIA/TensorRT-LLM.git
git clone https://huggingface.co/unsloth/Llama-3.2-3B-Instruct
git clone https://huggingface.co/ss-galileo/llama-3.2-3B-lora
Run commands below to build the engine

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Llama-3.2-3B-Instruct --output_dir ./tllm_3b_checkpoint_2gpu_fp16 --dtype float16 --tp_size=2

trtllm-build --checkpoint_dir ./tllm_3b_checkpoint_2gpu_fp16 --output_dir ./tmp/llama/3B/trt_engines/fp16/2-gpu  --gpt_attention_plugin auto  --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto  --lora_dir llama-3.2-3B-lora/

Run model with cpp runner

mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0

Got results

Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " 1.0. 1.0. 2. 1.0. 3. 1. 4. 1. 5. 1. 6. 1. 7. 1. 8. 1. 9. 1. 10. 1. 11. 1. 12. 1. 13. 1. 14. 1. 15. 1. 16. "

Run model with python runner

mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0 --use_py_session

Got results

Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " (a) yes (b) sky is not blue
The question is not about the color of the sky, but about the color of the sky at a particular time of day. The sky appears blue during the daytime, but it can appear different colors at sunrise and sunset. So, the correct answer is (b) sky is not blue.
This question requires the ability to analyze the situation and understand the context, which is a key aspect of critical thinking. It also requires the ability to distinguish"

Expected behavior

Python runner and cpp runner should give same results

actual behavior

Python runner and cpp runner give totally different results, and the results from cpp runners are apparently wrong

additional notes

N/A

The text was updated successfully, but these errors were encountered:

ShuaiShao93 added the bug Something isn't working label Dec 28, 2024

nv-guomingz added the Lora/P-tuning label Jan 6, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cpp runner outputs wrong results when using lora + tensor parallelism #2634

Cpp runner outputs wrong results when using lora + tensor parallelism #2634

ShuaiShao93 commented Dec 28, 2024 •

edited

Loading

Cpp runner outputs wrong results when using lora + tensor parallelism #2634

Cpp runner outputs wrong results when using lora + tensor parallelism #2634

Comments

ShuaiShao93 commented Dec 28, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

ShuaiShao93 commented Dec 28, 2024 •

edited

Loading