Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for newer I-Quant formats #722

Closed
InferenceIllusionist opened this issue Mar 3, 2024 · 4 comments
Closed

Support for newer I-Quant formats #722

InferenceIllusionist opened this issue Mar 3, 2024 · 4 comments

Comments

@InferenceIllusionist
Copy link

InferenceIllusionist commented Mar 3, 2024

Hi there, trying to run inference on an IQ4_XS quant (see this PR for more info - ggerganov#5747)
Koboldcpp loads the model and immediately crashes. Before the window closes the error message mentions something about a unhandled exception with koboldcpp. I also noticed that the amount of CUDA memory being reserved is significantly higher than the size of the model (9.93 gb)

This issue also happens to other newer quant formats like IQ2_M. Any workarounds I could try? Thanks!
screenshot-iq4_xs

@LostRuins
Copy link
Owner

The latest quant support in v1.59.1 currently is IQ3_S. Anything newer will require waiting for the next release, which should be out before the end of next week.

@InferenceIllusionist
Copy link
Author

Understood, thanks for letting me know. Happy to help test once the next release is out and appreciate your work on this.

@LostRuins
Copy link
Owner

Should be working in the latest version!

@InferenceIllusionist
Copy link
Author

Wow that was quick! Both IQ2_S and IQ4_XS are working. No issues at all after testing. Appreciate the fast follow-up and all the other new exciting features in the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants