Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2x7b model gives error about MoE quant and context memory pool #1291

Open
cmdrscotty opened this issue Dec 29, 2024 · 6 comments
Open

2x7b model gives error about MoE quant and context memory pool #1291

cmdrscotty opened this issue Dec 29, 2024 · 6 comments

Comments

@cmdrscotty
Copy link

cmdrscotty commented Dec 29, 2024

Describe the Issue
Any time I try to run a 2x7b model koboldcpp will error out with the following info:

!!!!!! WARNING: Using extremely outdated MoE quant. Please update it! Attempting to apply hacky kcpp fallback, using last ctx:0x1fe24dc94a0 ggml_new_object: not enough space in the context's memory pool (needed 209520, available 209088) ggml/src/ggml.c:1600: GGML_ASSERT(obj_new) failed

Doing 7b, 13b, or 20b doesn't give any issue at all. Tried it with 4k context, 6k, 8k and 12k context, same error no matter what with 2x7b models.

Additional Information:
Windows 11
AMD Ryzen 7 5700G
RX 7900 XTX (24GB) (Vulkan API)

KoboldCPP 1.80

Tested with InfinitiKuno-2x7B and Blue-Orchid-2x7B both throw the same error

KCPPS attached:
defaults3_win_amd_2x7B.zip

Of note, tested loading the model directly in LM Studio, works just fine loads it with no problem, even tried manually setting KoboldCPP to MoE Experts 2, and still errors out

@LostRuins
Copy link
Owner

Can you try requantizing that model? The tools can be found here: https://kcpptools.concedo.workers.dev/

@cmdrscotty
Copy link
Author

Gave it a pass through re-quantize but still got the same error as before:

!!!!!! WARNING: Using extremely outdated MoE quant. Please update it! Attempting to apply hacky kcpp fallback, using last ctx:0x17532644e90 ggml_new_object: not enough space in the context's memory pool (needed 209520, available 209088) ggml/src/ggml.c:1600: GGML_ASSERT(obj_new) failed

if it helps this is the command I ran for the re-quantize pass (assuming I did it right, but if not let me know)

.\quantize_gguf.exe --allow-requantize .\Blue-Orchid-2x7b-Q6_K.gguf 18 8

@LostRuins
Copy link
Owner

The output file will have a different name - make sure you pick the right one. By default it'll probably be named ggml-model or something. It does not overwrite the original model.

@cmdrscotty
Copy link
Author

ok yup went back and ran it again to be sure, used the output ggml-model-Q6_K.gguf that it created after re-quantizing, same error message.

@wbruna
Copy link

wbruna commented Jan 4, 2025

FWIW, I just tested https://huggingface.co/tensorblock/Blue-Orchid-2x7b-GGUF Q3_K_S on 1.81, and it worked fine, with no warning.

@cmdrscotty
Copy link
Author

cmdrscotty commented Jan 5, 2025

interesting yup downloaded 1.81 and tried the one you linked worked just fine.

looking in LM studio looks like the one I was downloading came from LoneStriker so makes me wonder if there's something different with how that one was put together that causes the MoE error.

But ran the Q6_K quant on the benchmark and came back with great results on Vulkan (RX 7900 XTX)

Benchmark Completed - v1.81 Results:
======
Flags: NoAVX2=False Threads=15 HighPriority=True Cublas_Args=None Tensor_Split=None BlasThreads=15 BlasBatchSize=512 FlashAttention=False KvCache=0
Timestamp: 2025-01-05 01:36:57.140774+00:00
Backend: koboldcpp_vulkan.dll
Layers: 35
Model: Blue-Orchid-2x7b-Q6_K
MaxCtx: 12288
GenAmount: 100
-----
ProcessingTime: 23.252s
ProcessingSpeed: 524.17T/s
GenerationTime: 4.061s
GenerationSpeed: 24.62T/s
TotalTime: 27.313s
Output:  1 1 1 1
-----
===
Press ENTER key to exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants