convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

cristianglezm · 2024-11-22T22:56:47Z

System Info

transformers.js v3.0.2
vue v3.5.13
vite v5.4.11

system info:

CPU: Intel(R) Core(TM) i5-8250U
GPU: Intel(R) UHD Graphics 620
RAM: 16GB

Environment/Platform

Description

I am trying to start using v3 but there are a few issues:

model is giving garbled output when on GPU (Chrome, Edge), the new converted model too.

Garbled description on GPU (q4f16 encoder, q8 decoder)

good enough description on CPU (q4f16 encoder, q8 decoder)

q4f16 is giving exception (only decoder, the encoder works)
decoder_model_merged_quantized size has gone up from 158MB to 297MB if converted from pytorch version it gives the same size if quantized from old model converted from 2.17

quantized from old converted onnx model

converted from pytorch

python convert.py --quantize y --model_id "cristianglezm/ViT-GPT2-FlowerCaptioner" --task 'image-to-text-with-past' --opset 19

Shouldn't q4 be smaller than q8?

I tested the xenova/vit-gpt2-image-captioning on GPU and CPU and it gives the same garbled description on the GPU and wrong description on CPU (that is why I fine-tuned it)

thanks

Reproduction

if you want to test it you can test cloning my project

git clone https://github.com/cristianglezm/FlowerEvolver-frontend
cd FlowerEvolver-frontend
git checkout hf-transformers-v3
npm i
npm run dev

go to Settings::ModelOptions change from CPU to GPU
go to local and wait for the demo flowers to be imported
click on flower 42 menu arrow and click describe

cristianglezm added the bug Something isn't working label Nov 22, 2024

cristianglezm changed the title ~~convert/quantize script doubles the size of the q8 model ViT-GPT2~~ convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

cristianglezm commented Nov 22, 2024

convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

Comments

cristianglezm commented Nov 22, 2024

System Info

Environment/Platform

Description

Reproduction