GPTQModel v1.2.3
Stable release with all feature and model unit tests passing. Fixed lots of model unit tests that did not pass or passed incorrectly in previous releases.
HF GLM support added. GLM/ChatGLM has two different code forks: one if non-hf integrated, and latest one is integrated into transformers. HF GLM and non-HF GLM are not weight compatible and we support both variants.
What's Changed
- Add GLM (HF-ied) support by @Qubitium in #581
- unit tests add args USE_VLLM by @ZYC-ModelCloud in #582
- Quantize record info by @ZYC-ModelCloud in #583
- [MISC] add gptqmodel[eval] and remove sentencepiece by @PZS-ModelCloud in #602
- [MISC] requirements remove gekko, ninja, huggingface-hub, protobuf by @PZS-ModelCloud in #603
- release gpu vram after layer.fwd by @LRL-ModelCloud in #616
- Delete unsupported model & skip gptnoex by @CSY-ModelCloud in #617
- [FIX] Some models put hidden_states in kwargs instead of args. by @ZX-ModelCloud in #621
- lm_eval vllm task add max_model_len=4096 args by @LRL-ModelCloud in #625
- try catch should only work with lmeval by @CSY-ModelCloud in #628
- set USE_VLLM = False by @LRL-ModelCloud in #629
- [FIX] if load quantized model. we will not monkey_path forward by @LRL-ModelCloud in #638
- simplified ModelLoader ModelWriter func by @ZYC-ModelCloud in #637
- disable chat for test_mpt by @CSY-ModelCloud in #641
- Update unit_tests.yml by @Qubitium in #642
- fix tokenized[0] wrong when getting value from BatchEncoding type by @CSY-ModelCloud in #643
New Contributors
- @jiqing-feng made their first contribution in #527
Full Changelog: v1.2.1...v1.2.3