-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for newer I-Quant formats #722
Comments
The latest quant support in v1.59.1 currently is |
Understood, thanks for letting me know. Happy to help test once the next release is out and appreciate your work on this. |
Should be working in the latest version! |
Wow that was quick! Both IQ2_S and IQ4_XS are working. No issues at all after testing. Appreciate the fast follow-up and all the other new exciting features in the latest release. |
Hi there, trying to run inference on an IQ4_XS quant (see this PR for more info - ggerganov#5747)
Koboldcpp loads the model and immediately crashes. Before the window closes the error message mentions something about a unhandled exception with koboldcpp. I also noticed that the amount of CUDA memory being reserved is significantly higher than the size of the model (9.93 gb)
This issue also happens to other newer quant formats like IQ2_M. Any workarounds I could try? Thanks!
The text was updated successfully, but these errors were encountered: