convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051
Open
1 of 5 tasks
Labels
bug
Something isn't working
System Info
transformers.js v3.0.2
vue v3.5.13
vite v5.4.11
system info:
CPU: Intel(R) Core(TM) i5-8250U
GPU: Intel(R) UHD Graphics 620
RAM: 16GB
Environment/Platform
Description
I am trying to start using v3 but there are a few issues:
Garbled description on GPU (q4f16 encoder, q8 decoder)
good enough description on CPU (q4f16 encoder, q8 decoder)
quantized from old converted onnx model
converted from pytorch
Shouldn't q4 be smaller than q8?
I tested the xenova/vit-gpt2-image-captioning on GPU and CPU and it gives the same garbled description on the GPU and wrong description on CPU (that is why I fine-tuned it)
thanks
Reproduction
if you want to test it you can test cloning my project
git clone https://github.com/cristianglezm/FlowerEvolver-frontend cd FlowerEvolver-frontend git checkout hf-transformers-v3 npm i npm run dev
The text was updated successfully, but these errors were encountered: