Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Triton Server memory leak in python backend decouple mode #1363

Closed
2 tasks done
zhyncs opened this issue Mar 28, 2024 · 4 comments
Closed
2 tasks done

[Bug] Triton Server memory leak in python backend decouple mode #1363

zhyncs opened this issue Mar 28, 2024 · 4 comments

Comments

@zhyncs
Copy link
Collaborator

zhyncs commented Mar 28, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

The version of Triton Server currently used in LMDeploy is r22.12

set(TRITON_BACKEND_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/backend repo")
set(TRITON_CORE_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/core repo")
set(TRITON_COMMON_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/common repo")
And we are currently using r23.02. Both versions have a memory leak issue when the python backend is in decouple mode and using stream triton-inference-server/server#6270. This issue has been fixed in r23.10 triton-inference-server/python_backend#309

In order to address this issue, we synchronized the fixed commit changes based on r23.02, please refer to the repository at https://github.com/zhyncs/python_backend

Currently, #1329 has been merged into main. I think it is necessary to update the documentation to inform users of this known issue. There are currently two ways to fix it: one is to directly update r22.12 in LMDeploy to r23.10, and the other is similar to our repository's method, only synchronizing the fixed commit.

@lvhan028 @lzhangzz @ispobock @zhulinJulia24

Reproduction

N/A

Environment

N/A

Error traceback

No response

@zhyncs
Copy link
Collaborator Author

zhyncs commented Mar 28, 2024

I think it is necessary to update the documentation to inform users of this known issue. There are currently two ways to fix it: one is to directly update r22.12 in LMDeploy to r23.10, and the other is similar to our repository's method, only synchronizing the fixed commit.

@lvhan028 @lzhangzz @zhulinJulia24 Do you have any suggestions? Thanks.

@irexyc
Copy link
Collaborator

irexyc commented Mar 29, 2024

LMDeploy will use cuda 12 by default after 0.2.6, so the base image in Dockerfile and the build deps of triton repositories should also be updated.

https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-03.html

@zhyncs
Copy link
Collaborator Author

zhyncs commented Mar 29, 2024

LMDeploy will use cuda 12 by default after 0.2.6, so the base image in Dockerfile and the build deps of triton repositories should also be updated.

https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-03.html

Cheers. Shall we update this with r24.03 accordingly?

set(TRITON_BACKEND_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/backend repo")
set(TRITON_CORE_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/core repo")
set(TRITON_COMMON_REPO_TAG "r22.12" CACHE STRING "Tag for triton-inference-server/common repo")

@zhyncs
Copy link
Collaborator Author

zhyncs commented Mar 29, 2024

I will send a PR asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants