Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552

sleepwalker2017 · 2024-05-07T11:24:24Z

System Info

GPU 2* A30, TRT-LLM branch main, commid id: 66ef1df

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

MODEL_CHECKPOINT=/data/models/vicuna-7b-v1.5/
CONVERTED_CHECKPOINT=Llama-7b-hf-ckpt

DTYPE=float16
TP=2

echo "step 1: convert checkpoint"
# Build lora enabled engine
python convert_checkpoint.py --model_dir ${MODEL_CHECKPOINT} \
                              --output_dir ${CONVERTED_CHECKPOINT} \
                              --dtype ${DTYPE} \
                              --tp_size ${TP} \
                              --pp_size 1

SOURCE_LORA=/data/Llama2-Chinese-7b-Chat-LoRA/
#SOURCE_LORA=/data/llama2-7b-lora.tar.gz
CPP_LORA=chinese-llama-2-lora-7b-cpp

EG_DIR=/tmp/lora-eg

PP=1
MAX_LEN=1024
MAX_BATCH=16
TOKENIZER=/data/models/vicuna-7b-v1.5/
LORA_ENGINE=Llama-2-7b-hf-engine
NUM_LORAS=(8)
NUM_REQUESTS=200

echo "step 2: trtllm-build"
trtllm-build \
    --checkpoint_dir ${CONVERTED_CHECKPOINT} \
    --output_dir ${LORA_ENGINE} \
    --max_batch_size ${MAX_BATCH} \
    --max_input_len $MAX_LEN \
    --max_output_len $MAX_LEN \
    --gpt_attention_plugin float16 \
    --paged_kv_cache enable \
    --remove_input_padding enable \
    --gemm_plugin float16 \
    --lora_plugin float16 \
    --use_paged_context_fmha enable \
    --use_custom_all_reduce disable \
    --lora_target_modules attn_qkv attn_dense mlp_h_to_4h mlp_gate mlp_4h_to_h
echo "step 3: Convert LoRA to cpp format"
# Convert LoRA to cpp format
python ../hf_lora_convert.py \
    -i $SOURCE_LORA \
    --storage-type $DTYPE \
    -o $CPP_LORA

echo "step 4: prepare dataset for non-lora requests"
mkdir -p $EG_DIR/data
python ../../benchmarks/cpp/prepare_dataset.py \
    --output ${EG_DIR}/data/token-norm-dist.json \
    --request-rate -1 \
    --time-delay-dist constant \
    --tokenizer $TOKENIZER \
    token-norm-dist \
    --num-requests $NUM_REQUESTS \
    --input-mean 256 --input-stdev 16 --output-mean 128 --output-stdev 24

echo "step 5: prepare dataset for lora requests"
for nloras in ${NUM_LORAS[@]}; do
    python ../../benchmarks/cpp/prepare_dataset.py \
        --output "${EG_DIR}/data/token-norm-dist-lora-${nloras}.json" \
        --request-rate -1 \
        --time-delay-dist constant \
        --rand-task-id 0 $(( $nloras - 1 )) \
        --tokenizer $TOKENIZER \
        token-norm-dist \
        --num-requests $NUM_REQUESTS \
        --input-mean 256 --input-stdev 16 --output-mean 128 --output-stdev 24
done

mkdir -p ${EG_DIR}/log-base-lora

NUM_LAYERS=32
NUM_LORA_MODS=8
MAX_LORA_RANK=8
EOS_ID=-1
mpirun -n ${TP} --allow-run-as-root --output-filename ${EG_DIR}/log-base-lora \
    ../../cpp/build/benchmarks/gptManagerBenchmark \
    --engine_dir $LORA_ENGINE \
    --type IFB \
    --dataset "${EG_DIR}/data/token-norm-dist-lora-8.json" \
    --lora_host_cache_bytes 8589934592 \
    --lora_num_device_mod_layers $(( 8 * $NUM_LAYERS * $NUM_LORA_MODS * $MAX_LORA_RANK )) \
    --kv_cache_free_gpu_mem_fraction 0.80 \
    --log_level info \
    --eos_id ${EOS_ID}

Expected behavior

Failed to run gptManager benchmark

actual behavior

[TensorRT-LLM][ERROR] Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182)
1       0x5572c6dedde9 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f56c6cd5378 /data/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(+0x69c378) [0x7f56c6cd5378]
3       0x7f56c8c3f03f tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 127
4       0x7f56c8c03078 tensorrt_llm::batch_manager::GptManager::fetchNewRequests() + 1464
5       0x7f56c8c0342a tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 170
6       0x7f56c64dd253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f56c64dd253]
7       0x7f56c624cac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f56c624cac3]
8       0x7f56c62de850 /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f56c62de850]

additional notes

none

The text was updated successfully, but these errors were encountered:

VincentJing · 2024-05-29T07:46:47Z

The LoRA cpp format you can refer to this link. the benchmark script you can refer to this.

nv-guomingz · 2024-11-14T03:14:57Z

Hi @sleepwalker2017 do u have any update on this ticket? If not, we'll close it soon.

sleepwalker2017 added the bug Something isn't working label May 7, 2024

sleepwalker2017 mentioned this issue May 7, 2024

What is the right data for SOURCE_LORA when Convert LoRA to cpp format #1453

Open

byshiue assigned byshiue and VincentJing May 29, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 29, 2024

kaiyux mentioned this issue Jun 4, 2024

Update TensorRT-LLM #1725

Merged

kaiyux mentioned this issue Jul 17, 2024

TensorRT-LLM v0.11 Update #1969

Merged

nv-guomingz added the stale label Nov 14, 2024

nv-guomingz closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552

Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552

sleepwalker2017 commented May 7, 2024 •

edited

Loading

VincentJing commented May 29, 2024

nv-guomingz commented Nov 14, 2024

Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552

Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552

Comments

sleepwalker2017 commented May 7, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

VincentJing commented May 29, 2024

nv-guomingz commented Nov 14, 2024

sleepwalker2017 commented May 7, 2024 •

edited

Loading