-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Triton Server memory leak in python backend decouple mode #1363
Comments
@lvhan028 @lzhangzz @zhulinJulia24 Do you have any suggestions? Thanks. |
LMDeploy will use cuda 12 by default after 0.2.6, so the base image in Dockerfile and the build deps of triton repositories should also be updated. https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-03.html |
Cheers. Shall we update this with r24.03 accordingly? lmdeploy/src/turbomind/triton_backend/CMakeLists.txt Lines 52 to 54 in e9d2724
|
I will send a PR asap. |
Checklist
Describe the bug
The version of Triton Server currently used in LMDeploy is r22.12
lmdeploy/src/turbomind/triton_backend/CMakeLists.txt
Lines 52 to 54 in e9d2724
In order to address this issue, we synchronized the fixed commit changes based on r23.02, please refer to the repository at https://github.com/zhyncs/python_backend
Currently, #1329 has been merged into main. I think it is necessary to update the documentation to inform users of this known issue. There are currently two ways to fix it: one is to directly update r22.12 in LMDeploy to r23.10, and the other is similar to our repository's method, only synchronizing the fixed commit.
@lvhan028 @lzhangzz @ispobock @zhulinJulia24
Reproduction
N/A
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: