Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Failed to create instance: unexpected error when creating modelInstanceState #328

Open
1 of 2 tasks
ken2190 opened this issue Feb 5, 2024 · 5 comments
Open
1 of 2 tasks
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@ken2190
Copy link

ken2190 commented Feb 5, 2024

System Info

  • CPU architecture (x86_64)
  • CPU/Host memory size (64GB)
  • GPU properties
    • GPU name (1x NVIDIA V100)
    • GPU memory size (32GB)
  • Libraries
    • TensorRT-LLM branch or tag (v0.5.0)
    • Versions of TensorRT branch (v0.5.0)
  • OS (Ubuntu 22.04)
  • Docker image version
ubuntu@t4:~$ docker image ls
REPOSITORY                    TAG                       IMAGE ID       CREATED        SIZE
tensorrt_llm/release          latest                    657b380771fc   6 weeks ago    26.8GB
nvcr.io/nvidia/tritonserver   23.10-trtllm-python-py3   2af3013c737f   3 months ago   24.2GB

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Reproduction

TensorRT-LLM (cloned from https://github.com/NVIDIA/TensorRT-LLM, git checkout v0.5.0)
tensorrtllm_backend (cloned from https://github.com/triton-inference-server/tensorrtllm_backend, git checkout v0.5.0)

mkdir -p /home/ubuntu/DATA/tensorrtllm_backend/engines
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864  \
                --gpus=all \
                --volume  /home/ubuntu/DATA/llama_hf/CodeLlama-7b-Instruct:/CodeLlama-7b-Instruct \
                --volume  /home/ubuntu/DATA/tensorrtllm_backend/engines:/engines \
                --volume /home/ubuntu/DATA:/code/tensorrt_llm \
                --workdir /code/tensorrt_llm/TensorRT-LLM \
                --hostname t4-release \
                --name tensorrt_llm-release-ubuntu \
                --tmpfs /tmp:exec \
                tensorrt_llm/release:latest

python3 examples/llama/build.py --model_dir /CodeLlama-7b-Instruct/ \
                --dtype float16 \
                --use_gpt_attention_plugin float16 \
                --use_inflight_batching \
                --paged_kv_cache \
                --remove_input_padding \
                --use_gemm_plugin float16 \
                --output_dir /engines/1-gpu/ \
                --world_size 1

Model built without problem

docker run --rm -it --net host --shm-size=2g \
    --ulimit memlock=-1 --ulimit stack=67108864 --gpus all \
    -v /home/ubuntu/DATA/tensorrtllm_backend:/tensorrtllm_backend \
    -v /home/ubuntu/DATA/llama_hf/CodeLlama-7b-Instruct:/CodeLlama-7b-Instruct \
    -v /home/ubuntu/DATA/tensorrtllm_backend/engines:/engines \
    nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3

cp -R /tensorrtllm_backend/all_models/inflight_batcher_llm /opt/tritonserver/.
cp /engines/1-gpu/* inflight_batcher_llm/tensorrt_llm/1/

HF_LLAMA_MODEL=/CodeLlama-7b-Instruct
ENGINE_DIR=/engines/1-gpu

python3 /tensorrtllm_backend/tools/fill_template.py -i inflight_batcher_llm/preprocessing/config.pbtxt tokenizer_dir:${HF_LLAMA_MODEL},tokenizer_type:llama,triton_max_batch_size:64,preprocessing_instance_count:1
python3 /tensorrtllm_backend/tools/fill_template.py -i inflight_batcher_llm/postprocessing/config.pbtxt tokenizer_dir:${HF_LLAMA_MODEL},tokenizer_type:llama,triton_max_batch_size:64,postprocessing_instance_count:1
python3 /tensorrtllm_backend/tools/fill_template.py -i inflight_batcher_llm/ensemble/config.pbtxt triton_max_batch_size:64
python3 /tensorrtllm_backend/tools/fill_template.py -i inflight_batcher_llm/tensorrt_llm/config.pbtxt triton_max_batch_size:64,decoupled_mode:False,max_beam_width:1,engine_dir:${ENGINE_DIR},max_tokens_in_paged_kv_cache:2560,max_attention_window_size:2560,kv_cache_free_gpu_mem_fraction:0.5,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:inflight_batching,max_queue_delay_microseconds:600,batch_scheduler_policy:guaranteed_no_evict


root@t4:/opt/tritonserver# pip install sentencepiece
root@t4:/opt/tritonserver# pip install protobuf

root@t4:/opt/tritonserver# python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=/opt/tritonserver/inflight_batcher_llm/


Expected behavior

I0922 23:28:40.351809 1 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0922 23:28:40.352017 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0922 23:28:40.395611 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

actual behavior

root@t4:/opt/tritonserver# pip install sentencepiece
root@t4:/opt/tritonserver# pip install protobuf

root@t4:/opt/tritonserver# python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=/opt/triton
server/inflight_batcher_llm/
root@t4:/opt/tritonserver# [TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 6
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 3
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
Cleaning up...
Cleaning up...
I0205 17:02:21.226812 416 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x10018000000' with size 268435456
I0205 17:02:21.228151 416 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0205 17:02:21.234199 416 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0205 17:02:21.234321 416 model_lifecycle.cc:461] loading: postprocessing:1
I0205 17:02:21.234458 416 model_lifecycle.cc:461] loading: preprocessing:1
I0205 17:02:21.365042 416 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0205 17:02:21.365285 416 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I0205 17:02:22.409739 416 model_lifecycle.cc:818] successfully loaded 'postprocessing'
I0205 17:02:25.260513 416 model_lifecycle.cc:818] successfully loaded 'preprocessing'
E0205 17:02:40.250190 416 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1       0x7f83fe81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f83fe81e045]
2       0x7f83fe873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f83fe873925]
3       0x7f83fe8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f83fe8747bf]
4       0x7f83fe8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f83fe8d5545]
5       0x7f83fe84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f83fe84ef4e]
6       0x7f83fe83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f83fe83ec0c]
7       0x7f83fe8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f83fe8395f5]
8       0x7f83fe8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f83fe8374db]
9       0x7f83fe81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f83fe81b182]
10      0x7f83fe81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11      0x7f845419aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f845419aa86]
12      0x7f845419bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f845419bcc6]
13      0x7f845417ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f845417ec15]
14      0x7f845417f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f845417f256]
15      0x7f845418b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f845418b27d]                             16      0x7f84537f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f84537f9ee8]
17      0x7f845417597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f845417597b]
18      0x7f8454185695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f8454185695]
19      0x7f845418a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f845418a50b]
20      0x7f8454273610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f8454273610]
21      0x7f8454276d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f8454276d03]
22      0x7f84543c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f84543c38b2]
23      0x7f8453a64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f8453a64253]
24      0x7f84537f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f84537f4ac3]
25      0x7f8453885bf4 clone + 68
E0205 17:02:40.250573 416 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when crea
ting modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1       0x7f83fe81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f83fe81e045]
2       0x7f83fe873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f83fe873925]
3       0x7f83fe8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f83fe8747bf]
4       0x7f83fe8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f83fe8d5545]
5       0x7f83fe84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f83fe84ef4e]
6       0x7f83fe83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f83fe83ec0c]
7       0x7f83fe8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f83fe8395f5]
8       0x7f83fe8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f83fe8374db]
9       0x7f83fe81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f83fe81b182]
10      0x7f83fe81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11      0x7f845419aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f845419aa86]                    [183/495]
12      0x7f845419bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f845419bcc6]
13      0x7f845417ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f845417ec15]
14      0x7f845417f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f845417f256]
15      0x7f845418b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f845418b27d]
16      0x7f84537f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f84537f9ee8]
17      0x7f845417597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f845417597b]
18      0x7f8454185695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f8454185695]
19      0x7f845418a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f845418a50b]
20      0x7f8454273610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f8454273610]
21      0x7f8454276d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f8454276d03]
22      0x7f84543c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f84543c38b2]
23      0x7f8453a64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f8453a64253]
24      0x7f84537f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f84537f4ac3]
25      0x7f8453885bf4 clone + 68
I0205 17:02:40.250673 416 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
E0205 17:02:40.250868 416 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' whic
h has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpecte
d error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1       0x7f83fe81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f83fe81e045]
2       0x7f83fe873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f83fe873925]
3       0x7f83fe8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f83fe8747bf]
4       0x7f83fe8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f83fe8d5545]
5       0x7f83fe84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f83fe84ef4e]
6       0x7f83fe83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f83fe83ec0c]
7       0x7f83fe8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f83fe8395f5]
8       0x7f83fe8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f83fe8374db]
9       0x7f83fe81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f83fe81b182]
10      0x7f83fe81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11      0x7f845419aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f845419aa86]
12      0x7f845419bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f845419bcc6]
13      0x7f845417ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f845417ec15]
14      0x7f845417f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f845417f256]
15      0x7f845418b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f845418b27d]
16      0x7f84537f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f84537f9ee8]
17      0x7f845417597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f845417597b]
18      0x7f8454185695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f8454185695]
19      0x7f845418a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f845418a50b]
20      0x7f8454273610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f8454273610]
21      0x7f8454276d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f8454276d03]
22      0x7f84543c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f84543c38b2]
23      0x7f8453a64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f8453a64253]
24      0x7f84537f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f84537f4ac3]
25      0x7f8453885bf4 clone + 68;
I0205 17:02:40.251051 416 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0205 17:02:40.251131 416 server.cc:619]
+-------------+-----------------------------------------------------------------+----------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-----------------------+
| Backend     | Path                                                            | Config

                       |
+-------------+-----------------------------------------------------------------+----------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-----------------------+
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","b
ackend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
                       |
| python      | /opt/tritonserver/backends/python/libtriton_python.so           | {"cmdline":{"auto-complete-config":"false","b
ackend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","shm-region-prefix-name":"prefix0_","default
-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+----------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-----------------------+

I0205 17:02:40.251331 416 server.cc:662]
+----------------+---------+---------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-------------------------------+
| Model          | Version | Status

                               |
+----------------+---------+---------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-------------------------------+
| postprocessing | 1       | READY

                               |
| preprocessing  | 1       | READY

                               |
| tensorrt_llm   | 1       | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CU
DA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/
runtime/bufferManager.cpp:170) |
|                |         | 1       0x7f83fe81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [
0x7f83fe81e045]
                               |
|                |         | 2       0x7f83fe873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [
0x7f83fe873925]
                               |
|                |         | 3       0x7f83fe8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [
0x7f83fe8747bf]
                               |
|                |         | 4       0x7f83fe8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [
0x7f83fe8d5545]
                               |
|                |         | 5       0x7f83fe84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [
0x7f83fe84ef4e]
                               |
|                |         | 6       0x7f83fe83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [
0x7f83fe83ec0c]
                               |
|                |         | 7       0x7f83fe8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [
0x7f83fe8395f5]
                               |
|                |         | 8       0x7f83fe8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [
0x7f83fe8374db]
                               |
|                |         | 9       0x7f83fe81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [
0x7f83fe81b182]
                               |
|                |         | 10      0x7f83fe81b235 TRITONBACKEND_ModelInstanceInitialize + 101

                               |
|                |         | 11      0x7f845419aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f845419aa86]

                               |
|                |         | 12      0x7f845419bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f845419bcc6]

                               |
|                |         | 13      0x7f845417ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f845417ec15]

                               |
|                |         | 14      0x7f845417f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f845417f256]

                               |
|                |         | 15      0x7f845418b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f845418b27d]

                               |
|                |         | 16      0x7f84537f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f84537f9ee8]

                               |
|                |         | 17      0x7f845417597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f845417597b]

                               |
|                |         | 18      0x7f8454185695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f8454185695]

                               |
|                |         | 19      0x7f845418a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f845418a50b]

                               |
|                |         | 20      0x7f8454273610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f8454273610]

                               |
|                |         | 21      0x7f8454276d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f8454276d03]

                               |
|                |         | 22      0x7f84543c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f845[30/495]

                               |
|                |         | 23      0x7f8453a64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f8453a64253]

                               |
|                |         | 24      0x7f84537f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f84537f4ac3]

                               |
|                |         | 25      0x7f8453885bf4 clone + 68

                               |
+----------------+---------+---------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
-------------------------------+

I0205 17:02:40.271435 416 metrics.cc:817] Collecting metrics for GPU 0: GRID V100S-32Q
I0205 17:02:40.271838 416 metrics.cc:710] Collecting CPU metrics
I0205 17:02:40.272083 416 tritonserver.cc:2458]
+----------------------------------+-------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+
| Option                           | Value
                                                                                                                      |
+----------------------------------+-------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton
                                                                                                                      |
| server_version                   | 2.39.0
                                                                                                                      |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_poli
cy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /opt/tritonserver/inflight_batcher_llm/
                                                                                                                      |
| model_control_mode               | MODE_NONE
                                                                                                                      |
| strict_model_config              | 1
                                                                                                                      |
| rate_limit                       | OFF
                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456
                                                                                                                      |
| cuda_memory_pool_byte_size{0}    | 67108864
                                                                                                                      |
| min_supported_compute_capability | 6.0
                                                                                                                      |
| strict_readiness                 | 1
                                                                                                                      |
| exit_timeout                     | 30
                                                                                                                      |
| cache_enabled                    | 0
                                                                                                                      |
+----------------------------------+-------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+

I0205 17:02:40.272122 416 server.cc:293] Waiting for in-flight requests to complete.
I0205 17:02:40.272151 416 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0205 17:02:40.272348 416 server.cc:324] All models are stopped, unloading models
I0205 17:02:40.272370 416 server.cc:331] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0205 17:02:41.272520 416 server.cc:331] Timeout 29: Found 2 live models and 0 in-flight non-inference requests
W0205 17:02:41.276802 416 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0205 17:02:41.276858 416 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0205 17:02:41.276883 416 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0205 17:02:41.641130 416 model_lifecycle.cc:603] successfully unloaded 'postprocessing' version 1
I0205 17:02:42.046922 416 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1
I0205 17:02:42.272760 416 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
W0205 17:02:42.277527 416 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0205 17:02:42.277595 416 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0205 17:02:42.277630 416 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
error: creating server: Internal - failed to load all models
W0205 17:02:43.279857 416 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0205 17:02:43.279967 416 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0205 17:02:43.279983 416 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[55476,1],0]
  Exit code:    1
--------------------------------------------------------------------------


additional notes

@byshiue @schetlur-nv
@juney-nvidia
Could you take a look

@ken2190 ken2190 added the bug Something isn't working label Feb 5, 2024
@byshiue
Copy link
Collaborator

byshiue commented Feb 27, 2024

Could you try latest main branch?

@tobernat
Copy link

tobernat commented Mar 5, 2024

I get the exact same error when trying to replicate the following tutorial with version 0.5.0: https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/
I tried with the latest main branch, but still got the error [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported

Has the issue been resolved?

@schetlur-nv
Copy link
Collaborator

@tobernat I see on #363 that you were able to resolve this. Is there anything more that needs to be done?
@ken2190 can you please refer to the fix posted in issue 363 linked above?

@tobernat
Copy link

In our case (see #363) it was solved by setting vGPU plugin parameters in VMware:
https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere
see also here: https://kb.vmware.com/s/article/2142307

@byshiue byshiue added the triaged Issue has been triaged by maintainers label Mar 19, 2024
@renareke
Copy link

renareke commented Apr 9, 2024

In our case (see #363) it was solved by setting vGPU plugin parameters in VMware: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere see also here: https://kb.vmware.com/s/article/2142307
@tobernat What parameters where needed to set to get this issue resolved? I have the same issue with a L40s card.

Thanks for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

5 participants