[Bug] Triton Server memory leak in python backend decouple mode #1363

zhyncs · 2024-03-28T07:23:25Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

The version of Triton Server currently used in LMDeploy is r22.12

lmdeploy/src/turbomind/triton_backend/CMakeLists.txt

Lines 52 to 54 in e9d2724

    
           set(TRITON_BACKEND_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/backend repo") 
        
           set(TRITON_CORE_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/core repo") 
        
           set(TRITON_COMMON_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/common repo")

And we are currently using r23.02. Both versions have a memory leak issue when the python backend is in decouple mode and using stream triton-inference-server/server#6270. This issue has been fixed in r23.10 triton-inference-server/python_backend#309

In order to address this issue, we synchronized the fixed commit changes based on r23.02, please refer to the repository at https://github.com/zhyncs/python_backend

Currently, #1329 has been merged into main. I think it is necessary to update the documentation to inform users of this known issue. There are currently two ways to fix it: one is to directly update r22.12 in LMDeploy to r23.10, and the other is similar to our repository's method, only synchronizing the fixed commit.

@lvhan028 @lzhangzz @ispobock @zhulinJulia24

Reproduction

N/A

Environment

N/A

Error traceback

No response

zhyncs · 2024-03-28T07:34:37Z

I think it is necessary to update the documentation to inform users of this known issue. There are currently two ways to fix it: one is to directly update r22.12 in LMDeploy to r23.10, and the other is similar to our repository's method, only synchronizing the fixed commit.

@lvhan028 @lzhangzz @zhulinJulia24 Do you have any suggestions? Thanks.

irexyc · 2024-03-29T08:14:24Z

LMDeploy will use cuda 12 by default after 0.2.6, so the base image in Dockerfile and the build deps of triton repositories should also be updated.

https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-03.html

zhyncs · 2024-03-29T08:19:30Z

LMDeploy will use cuda 12 by default after 0.2.6, so the base image in Dockerfile and the build deps of triton repositories should also be updated.

https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-03.html

Cheers. Shall we update this with r24.03 accordingly?

lmdeploy/src/turbomind/triton_backend/CMakeLists.txt

Lines 52 to 54 in e9d2724

    
           set(TRITON_BACKEND_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/backend repo") 
        
           set(TRITON_CORE_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/core repo") 
        
           set(TRITON_COMMON_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/common repo")

zhyncs · 2024-03-29T08:19:58Z

I will send a PR asap.

zhyncs mentioned this issue Mar 28, 2024

[Bug] Memory leak for api_server #1334

Closed

2 tasks

zhyncs mentioned this issue Mar 29, 2024

update triton server version to r24.03 #1371

Closed

zhyncs mentioned this issue Apr 16, 2024

add the recommendation version for Python Backend #1436

Merged

zhyncs closed this as completed Apr 16, 2024

zhyncs mentioned this issue Jul 10, 2024

Remove the triton inference server backend "turbomind_backend" #1986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Triton Server memory leak in python backend decouple mode #1363

[Bug] Triton Server memory leak in python backend decouple mode #1363

zhyncs commented Mar 28, 2024 •

edited

Loading

zhyncs commented Mar 28, 2024

irexyc commented Mar 29, 2024 •

edited

Loading

zhyncs commented Mar 29, 2024

zhyncs commented Mar 29, 2024

[Bug] Triton Server memory leak in python backend decouple mode #1363

[Bug] Triton Server memory leak in python backend decouple mode #1363

Comments

zhyncs commented Mar 28, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

zhyncs commented Mar 28, 2024

irexyc commented Mar 29, 2024 • edited Loading

zhyncs commented Mar 29, 2024

zhyncs commented Mar 29, 2024

zhyncs commented Mar 28, 2024 •

edited

Loading

irexyc commented Mar 29, 2024 •

edited

Loading