Prompt sequence length is greater than 4096 #375

joshpopelka20 · 2024-06-03T19:39:18Z

Describe the bug
Getting error message: Prompt sequence length is greater than 4096

This is the model details:

tok_model_id="NousResearch/Meta-Llama-3-8B-Instruct",
quantized_model_id="NousResearch/Meta-Llama-3-8B-Instruct-GGUF",
quantized_filename="Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",

I see in #89 that the gguf context length is supposed to come from the metadata. This is the metadata from the model I'm using:

llama.context_length: 8192

Right now, I'm just testing. In the future, I plan to use one of the gradient.ai long context models. So I'll probably go beyond 8192. Please take this into account with any necessary changes.

Latest commit
b6fffdd

The text was updated successfully, but these errors were encountered:

EricLBuehler · 2024-06-03T22:37:16Z

Ah, I think this is due to the u32 type used in the model you linked while we expect u64. This should be an easy fix.

EricLBuehler · 2024-06-03T22:41:32Z

@joshpopelka20, I think this should be fixed now. Can you please confirm that it works?

polarathene · 2024-06-04T07:36:27Z

while we expect u64

The spec expects u64 too? That model publisher is already known to have other mistakes in their published models, I don't know if mistral.rs should implement a workaround for niche scenarios like that.

Presumably this is something that could be patched on the file directly via a different tool? (UPDATE: Yes here you go)

EDIT: This was historically u32 in prior version of GGUF:

Most countable values (lengths, etc) were changed from uint32 to uint64 to allow for larger models to be supported in the future.

EricLBuehler · 2024-06-04T11:44:27Z

Yes, exactly. I think this was a good solution, it will allow loading some of those legacy GGUF files.

joshpopelka20 · 2024-06-04T14:24:09Z

I'm getting the same error message.

Can you update the Pypi package as well? I'm using the python api for mistral.rs.

EricLBuehler · 2024-06-05T01:25:37Z

Hi @joshpopelka20, I updated the PyPI package, so the latest (right now) is 0.1.14. I assume you got the error message with the old Python API? Please let me know if you get the error again.

joshpopelka20 · 2024-06-05T01:53:18Z

Yes, I was getting the same error with the old Python package.

Looks like I got a different error this time:

PanicException: called `Either::unwrap_right()` on a `Left` value: ["Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"]

EricLBuehler · 2024-06-05T02:16:54Z

Ah, @joshpopelka20 thanks! I put up a new release 0.1.15 now which should fix this.

joshpopelka20 · 2024-06-05T15:13:36Z

Yes, that worked for me 👍

Before I close the issue, I'm having a problem with CUDA Out of Memory. I tried using sagemaker instance, ml.p4d.24xlarge, which has 8 A100s. It seems that only one of the GPUs memory is actually being used. Is there a way to spread the memory footprint out between the 8 GPUs?

I looked at the Runner class and GGUF, and neither seemed to have an option to specify the number of GPUs.

EricLBuehler · 2024-06-05T15:19:03Z

Glad that it worked!

No, we don't have cross GPU device mapping, only GPU + CPU mapping. Additionally, we also do not have tensor parallelism via nccl yet.

I will work on adding the cross GPU device mapping feature. In the meantime, can you please open a new issue to track that after you close this? Thank you.

joshpopelka20 · 2024-06-05T15:20:44Z

No problem. I'll close this issue and create the new feature request.

joshpopelka20 added the bug Something isn't working label Jun 3, 2024

EricLBuehler mentioned this issue Jun 3, 2024

Manual subtyping for u32 in GGUF max seq len #376

Merged

EricLBuehler mentioned this issue Jun 3, 2024

Automatically upcasting GGUF values huggingface/candle#2243

Closed

EricLBuehler mentioned this issue Jun 5, 2024

Patch incorrect unwrap and bump version #383

Merged

joshpopelka20 closed this as completed Jun 5, 2024

b0xtch mentioned this issue Jun 21, 2024

Does mistral support multiple gpus? #451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt sequence length is greater than 4096 #375

Prompt sequence length is greater than 4096 #375

joshpopelka20 commented Jun 3, 2024

EricLBuehler commented Jun 3, 2024

EricLBuehler commented Jun 3, 2024

polarathene commented Jun 4, 2024 •

edited

Loading

EricLBuehler commented Jun 4, 2024

joshpopelka20 commented Jun 4, 2024

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024 •

edited

Loading

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024

Prompt sequence length is greater than 4096 #375

Prompt sequence length is greater than 4096 #375

Comments

joshpopelka20 commented Jun 3, 2024

EricLBuehler commented Jun 3, 2024

EricLBuehler commented Jun 3, 2024

polarathene commented Jun 4, 2024 • edited Loading

EricLBuehler commented Jun 4, 2024

joshpopelka20 commented Jun 4, 2024

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024 • edited Loading

EricLBuehler commented Jun 5, 2024

joshpopelka20 commented Jun 5, 2024

polarathene commented Jun 4, 2024 •

edited

Loading

joshpopelka20 commented Jun 5, 2024 •

edited

Loading