RuntimeError: result type Float can't be cast to the desired output type Byte #26383

Vectorrent · 2023-09-25T12:34:30Z

System Info

transformers version: 4.33.2
Platform: Linux-6.5.3-arch1-1-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.17.2
Safetensors version: 0.3.3
Accelerate version: 0.23.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

@gante @SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

This problem occurs when trying to use RWKV with the "bnb_4bit_use_double_quant" argument in a BitsAndBytesConfig. You can fully reproduce the error with the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "RWKV/rwkv-4-169m-pile"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, quantization_config=quantization_config
)

inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))

Expected behavior

I would expect this to work, fail gracefully, or perhaps revert to a supported setting. For now, simply disabling "bnb_4bit_use_double_quant" resolves the issue with RWKV, and I've not seen it happen elsewhere.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2023-09-25T12:58:14Z

cc @younesbelkada as well

younesbelkada · 2023-09-25T13:07:23Z

Hi @LuciferianInk
Thanks for the issue, I recently made #26134 that should fix all issues related with RWKV and 4bit, please install transformers from source pip install -U git+https://github.com/huggingface/transformers.git and let me know if this fixes your issue

Vectorrent · 2023-09-25T13:40:11Z

Thanks for the update! While this build does appear to fix my earlier problem, it breaks the custom CUDA kernel that's supposed to ship with RWKV. Because this results in 10x slower computations in RWKV, I'll have to revert back to 4.32.X for now.

For reference, I am installing from pip inside of the nvcr.io/nvidia/cuda:12.2.0-devel-ubuntu22.04 container. The kernel works great, on the previous build. Feel free to close this issue if you'll track that one elsewhere.

Vectorrent · 2023-10-14T12:43:34Z

As of 4.34.0, this does appear to be resolved. Thanks for the update!

Vectorrent closed this as completed Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: result type Float can't be cast to the desired output type Byte #26383

RuntimeError: result type Float can't be cast to the desired output type Byte #26383

Vectorrent commented Sep 25, 2023

LysandreJik commented Sep 25, 2023

younesbelkada commented Sep 25, 2023

Vectorrent commented Sep 25, 2023

Vectorrent commented Oct 14, 2023

RuntimeError: result type Float can't be cast to the desired output type Byte #26383

RuntimeError: result type Float can't be cast to the desired output type Byte #26383

Comments

Vectorrent commented Sep 25, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Sep 25, 2023

younesbelkada commented Sep 25, 2023

Vectorrent commented Sep 25, 2023

Vectorrent commented Oct 14, 2023