We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A800 trtllm v0.8
No response
examples
can infer normally by using multi-block decoding.
some informations may help:
The text was updated successfully, but these errors were encountered:
Sorry for late response. It is a known issue about computing the share memory size when we use sliding window. We will fix it soon.
Sorry, something went wrong.
the fix will be included in next week's update (Tuesday). Feel free to give it a try.
PerkzZheng
Successfully merging a pull request may close this issue.
System Info
A800
trtllm v0.8
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
can infer normally by using multi-block decoding.
actual behavior
additional notes
some informations may help:
The text was updated successfully, but these errors were encountered: