-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix offline_inference_with_prefix.py #9505
[Bugfix] Fix offline_inference_with_prefix.py #9505
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
I also thought that If possible, I prefer it to be per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock CI, let's merge this first. We can fix the quantization tests.and add docs in another PR.
Agreed with both comments @DarkLight1337 |
It might be better to solve this issue in this example completely by destroying the first LLM before creating the next one. I could help fix this in another PR. |
Signed-off-by: charlifu <[email protected]>
Signed-off-by: Vinay Damodaran <[email protected]>
Signed-off-by: Alvant <[email protected]>
Signed-off-by: Amit Garg <[email protected]>
Signed-off-by: qishuai <[email protected]>
Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
The example runs OOM on current main. This is the same as the fix in
fix_lazy_outlines.py
in #9352.Copying @joerunde's inline comment from that PR: