Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support minicpm3.0 #605

Merged
merged 4 commits into from
Nov 14, 2024
Merged

Conversation

LDLINGLINGLING
Copy link
Contributor

@LDLINGLINGLING LDLINGLINGLING commented Sep 6, 2024

This time, no files were added, and the original class was inherited. It can be quantified according to the most basic method of the original Autoawq.

@LDLINGLINGLING
Copy link
Contributor Author

Hello,the following are the results of the perplexity test:

pretrained model: minicpm3
gpu usage: 8.67GB
Perplexity 7.522: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 170/170 [00:30<00:00, 5.58it/s]
awq model: minicpm3
gpu usage: 3.29GB
Perplexity 8.195: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 164/164

@casper-hansen
Copy link
Owner

@LDLINGLINGLING Sorry for taking so long. I simplified the modeling and added your custom quantizer to the docs. We now use Triton kernels which work with smaller models like MiniCPM3 4B out of the box, so there are no more CUDA issues.

@casper-hansen casper-hansen merged commit b42e3c3 into casper-hansen:main Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants