Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does recurrentgemma support quantization? #2450

Open
daiwk opened this issue Nov 15, 2024 · 2 comments
Open

Does recurrentgemma support quantization? #2450

daiwk opened this issue Nov 15, 2024 · 2 comments
Assignees
Labels
question Further information is requested triaged Issue has been triaged by maintainers

Comments

@daiwk
Copy link

daiwk commented Nov 15, 2024

https://developer.nvidia.com/zh-cn/blog/nvidia-tensorrt-llm-revs-up-inference-for-google-gemma/

This post says gemma supports quantization, so does recurrentgemma support quantization?

@hello-11 hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Nov 18, 2024
@byshiue
Copy link
Collaborator

byshiue commented Nov 18, 2024

If its model architecture is same to gemma, then it should be supported.

@daiwk
Copy link
Author

daiwk commented Nov 30, 2024

If its model architecture is same to gemma, then it should be supported.
@byshiue its model arch is different with gemma, because it has a mamba-like RNN architecture, rg-lru.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants