Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt sequence length is greater than 4096 #375

Closed
joshpopelka20 opened this issue Jun 3, 2024 · 11 comments
Closed

Prompt sequence length is greater than 4096 #375

joshpopelka20 opened this issue Jun 3, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@joshpopelka20
Copy link
Contributor

Describe the bug
Getting error message: Prompt sequence length is greater than 4096

This is the model details:

tok_model_id="NousResearch/Meta-Llama-3-8B-Instruct",
quantized_model_id="NousResearch/Meta-Llama-3-8B-Instruct-GGUF",
quantized_filename="Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",

I see in #89 that the gguf context length is supposed to come from the metadata. This is the metadata from the model I'm using:

llama.context_length: 8192

Right now, I'm just testing. In the future, I plan to use one of the gradient.ai long context models. So I'll probably go beyond 8192. Please take this into account with any necessary changes.

Latest commit
b6fffdd

@EricLBuehler
Copy link
Owner

Ah, I think this is due to the u32 type used in the model you linked while we expect u64. This should be an easy fix.

@EricLBuehler
Copy link
Owner

@joshpopelka20, I think this should be fixed now. Can you please confirm that it works?

@polarathene
Copy link
Contributor

polarathene commented Jun 4, 2024

while we expect u64

The spec expects u64 too? That model publisher is already known to have other mistakes in their published models, I don't know if mistral.rs should implement a workaround for niche scenarios like that.

Presumably this is something that could be patched on the file directly via a different tool? (UPDATE: Yes here you go)


EDIT: This was historically u32 in prior version of GGUF:

Most countable values (lengths, etc) were changed from uint32 to uint64 to allow for larger models to be supported in the future.

@EricLBuehler
Copy link
Owner

Yes, exactly. I think this was a good solution, it will allow loading some of those legacy GGUF files.

@joshpopelka20
Copy link
Contributor Author

I'm getting the same error message.

Can you update the Pypi package as well? I'm using the python api for mistral.rs.

@EricLBuehler
Copy link
Owner

Hi @joshpopelka20, I updated the PyPI package, so the latest (right now) is 0.1.14. I assume you got the error message with the old Python API? Please let me know if you get the error again.

@joshpopelka20
Copy link
Contributor Author

Yes, I was getting the same error with the old Python package.

Looks like I got a different error this time:

PanicException: called `Either::unwrap_right()` on a `Left` value: ["Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"]

@EricLBuehler
Copy link
Owner

Ah, @joshpopelka20 thanks! I put up a new release 0.1.15 now which should fix this.

@joshpopelka20
Copy link
Contributor Author

joshpopelka20 commented Jun 5, 2024

Yes, that worked for me 👍

Before I close the issue, I'm having a problem with CUDA Out of Memory. I tried using sagemaker instance, ml.p4d.24xlarge, which has 8 A100s. It seems that only one of the GPUs memory is actually being used. Is there a way to spread the memory footprint out between the 8 GPUs?

I looked at the Runner class and GGUF, and neither seemed to have an option to specify the number of GPUs.

@EricLBuehler
Copy link
Owner

Glad that it worked!

No, we don't have cross GPU device mapping, only GPU + CPU mapping. Additionally, we also do not have tensor parallelism via nccl yet.

I will work on adding the cross GPU device mapping feature. In the meantime, can you please open a new issue to track that after you close this? Thank you.

@joshpopelka20
Copy link
Contributor Author

No problem. I'll close this issue and create the new feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants