-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt sequence length is greater than 4096 #375
Comments
Ah, I think this is due to the |
@joshpopelka20, I think this should be fixed now. Can you please confirm that it works? |
The spec expects Presumably this is something that could be patched on the file directly via a different tool? (UPDATE: Yes here you go) EDIT: This was historically
|
Yes, exactly. I think this was a good solution, it will allow loading some of those legacy GGUF files. |
I'm getting the same error message. Can you update the Pypi package as well? I'm using the python api for mistral.rs. |
Hi @joshpopelka20, I updated the PyPI package, so the latest (right now) is 0.1.14. I assume you got the error message with the old Python API? Please let me know if you get the error again. |
Yes, I was getting the same error with the old Python package. Looks like I got a different error this time:
|
Ah, @joshpopelka20 thanks! I put up a new release 0.1.15 now which should fix this. |
Yes, that worked for me 👍 Before I close the issue, I'm having a problem with CUDA Out of Memory. I tried using sagemaker instance, ml.p4d.24xlarge, which has 8 A100s. It seems that only one of the GPUs memory is actually being used. Is there a way to spread the memory footprint out between the 8 GPUs? I looked at the Runner class and GGUF, and neither seemed to have an option to specify the number of GPUs. |
Glad that it worked! No, we don't have cross GPU device mapping, only GPU + CPU mapping. Additionally, we also do not have tensor parallelism via nccl yet. I will work on adding the cross GPU device mapping feature. In the meantime, can you please open a new issue to track that after you close this? Thank you. |
No problem. I'll close this issue and create the new feature request. |
Describe the bug
Getting error message:
Prompt sequence length is greater than 4096
This is the model details:
I see in #89 that the gguf context length is supposed to come from the metadata. This is the metadata from the model I'm using:
Right now, I'm just testing. In the future, I plan to use one of the gradient.ai long context models. So I'll probably go beyond 8192. Please take this into account with any necessary changes.
Latest commit
b6fffdd
The text was updated successfully, but these errors were encountered: