ValueError: Cannot flatten integer dtype tensors #1432

fancyerii · 2024-02-04T03:26:10Z

System Info

peft 0.8.2
Python 3.9.18
accelerate 0.26.1
torch 2.2.0
transformers 4.37.2

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I am following Fine-tuning with 🤗 TRL and run with:

accelerate launch --config_file examples/accelerate_configs/multi_gpu.yaml --num_processes=4 \
    examples/scripts/sft.py \
    --model_name /nas/lili/models_hf/Mixtral-8x7B-Instruct-v0.1 \
    --dataset_name trl-lib/ultrachat_200k_chatml \
    --batch_size 2 \
    --gradient_accumulation_steps 1 \
    --learning_rate 2e-4 \
    --save_steps 200_000 \
    --use_peft \
    --peft_lora_r 16 --peft_lora_alpha 32 \
    --target_modules q_proj k_proj v_proj o_proj \
    --load_in_4bit \
    --output output \
    --use_auth_token false

It throws:

Traceback (most recent call last):
  File "/nas/lili/codes/pt/ft/trl/examples/scripts/sft.py", line 158, in <module>
    trainer.train()
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 315, in train
    output = super().train(*args, **kwargs)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1821, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/accelerate/data_loader.py", line 448, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 45, in __call__  
    return self.torch_call(features)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 732, in torch_call
    batch = self.tokenizer.pad(examples, return_tensors="pt", pad_to_multiple_of=self.pad_to_multiple_of)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3259, in pad
    padding_strategy, _, max_length, _ = self._get_padding_truncation_strategies(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2707, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

Besides this error, there are warning like:

/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:282: UserWarning: You passed a token
izer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when tra
ining a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.

my env:

torch                    2.1.2
transformers             4.36.2
trl                      0.7.8.dev0
accelerate               0.25.0
peft                     0.7.1
bitsandbytes             0.41.3.post2

I have added tokenizer.padding_side = 'right'. And it throws:

Traceback (most recent call last):
  File "/nas/lili/codes/pt/ft/trl/examples/scripts/sft.py", line 117, in <module>
    trainer.train()
  File "/nas/lili/codes/pt/ft/trl/trl/trainer/sft_trainer.py", line 326, in train
    output = super().train(*args, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop
    model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1227, in prepare
    result = tuple(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1228, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1104, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1386, in prepare_model
    model = FSDP(model, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 477, in __init__
    _auto_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap
    _recursive_wrap(**recursive_wrap_kwargs, **root_kwargs)  # type: ignore[arg-type]
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  [Previous line repeated 2 more times]
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 561, in _recursive_wrap
    return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 490, in _wrap
    return wrapper_cls(module, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 503, in __init__
    _init_param_handle_from_module(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_init_utils.py", line 594, in _init_param_handle_from_module
    _init_param_handle_from_params(state, managed_params, fully_sharded_module)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_init_utils.py", line 606, in _init_param_handle_from_params
    handle = FlatParamHandle(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 573, in __init__
    self._init_flat_param_and_metadata(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 623, in _init_flat_param_and_metadata
    ) = self._validate_tensors_to_flatten(params)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 759, in _validate_tensors_to_flatten
    raise ValueError("Cannot flatten integer dtype tensors")

And I found this meta-llama/llama-recipes#240. It says fsdp can't use load_in_4bit. When I turn of load_in_4bit, it works.(It has another oom problem, you can see here but it's not related to peft)

Expected behavior

fix bug

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-02-04T16:18:17Z

Hi @fancyerii
Thanks a lot for the issue !
For the first issue you need to set tokenizer.pad_token = tokenizer.eos_token
Indeed FSDP + QLoRA are not compatible - yet ! @Titus-von-Koeller @pacman100 & external collaborators (@warner-benjamin @weifengpy) worked recently on adding QLoRA + FSDP support: bitsandbytes-foundation/bitsandbytes#970 can you try out to build bnb from source? (installation guideleines here: https://huggingface.co/docs/bitsandbytes/main/en/installation) - let us know how it goes !

fancyerii · 2024-02-05T09:07:29Z

@younesbelkada it seems my g++ version do not support c++20. What's the recommended gcc version?

/usr/local/cuda-12.1/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -Xcompiler '-fPIC' -dlink /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o -o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++20 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-12.1/include -I /nas/lili/codes/pt/ft/bitsandbytes/csrc -I /home/ubuntu/miniconda3/include -I /nas/lili/codes/pt/ft/bitsandbytes/include /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o /nas/lili/codes/pt/ft/bitsandbytes/csrc/common.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/cpu_ops.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda121.so -L /usr/local/cuda-12.1/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /home/ubuntu/miniconda3/lib
g++: error: unrecognized command line option ‘-std=c++20’; did you mean ‘-std=c++03’?

fancyerii · 2024-02-05T09:19:05Z

I installed gcc/g++ 8.4. it still can't support -std=c++20:

/usr/local/cuda-12.1/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -Xcompiler '-fPIC' -dlink /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o -o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++20 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-12.1/include -I /nas/lili/codes/pt/ft/bitsandbytes/csrc -I /home/ubuntu/miniconda3/include -I /nas/lili/codes/pt/ft/bitsandbytes/include /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o /nas/lili/codes/pt/ft/bitsandbytes/csrc/common.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/cpu_ops.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda121.so -L /usr/local/cuda-12.1/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /home/ubuntu/miniconda3/lib
g++: error: unrecognized command line option ‘-std=c++20’; did you mean ‘-std=c++2a’?

I searched this post but still don't know which version of gcc/g++ can compile peft.

fancyerii · 2024-02-05T10:29:13Z

I installed gcc/g++-10 and build bitsandbytes successfully. Now I don't confront ValueError: Cannot flatten integer dtype tensors. But it still oom. So I enable gradient_checkpointing and it run correctly. it seems load in 4bit don't save that much gpu memory?

github-actions · 2024-03-05T15:03:29Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

younesbelkada · 2024-03-13T15:08:37Z

FSDP + QLoRA should be now fully integrated in HF ecosystem
Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh
Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

shatealaboxiaowang · 2024-03-15T02:33:46Z

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

fancyerii · 2024-03-15T05:57:52Z

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

great job. I can run it with 8 gpus in one node. later I will try with two nodes.

fancyerii · 2024-03-15T06:02:54Z

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt)
requirements.txt

shatealaboxiaowang · 2024-03-15T07:47:25Z

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt) requirements.txt

Thank you very much for sharing. My pip installed version is exactly the same as yours accelerate & bitsandbytes、transformers、trl and peft from source. but the same error.

maybe I launch training by accelerate ? I don't know why.

github-actions · 2024-04-08T15:03:48Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

iMTimmyyy · 2024-04-24T19:20:41Z

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt) requirements.txt

Thank you very much for sharing. My pip installed version is exactly the same as yours accelerate & bitsandbytes、transformers、trl and peft from source. but the same error.

maybe I launch training by accelerate ? I don't know why.

@shatealaboxiaowang
Hi, I am experiencing the same issue as you do. Have you managed to find a solution?

Titus-von-Koeller · 2024-07-23T13:06:38Z

Hey, the bitsandbytes lead maintainer here 🤗

For anyone still hitting this issue due to FSDP + QLoRA, please make sure to read these docs about correctly configuring the bnb_4bit_quant_storage to a float dtype (as FSDP doesn't support int dtypes) and the dtype should exactly match the rest of the model tensors, enabling the FSDP wrapping logic to work correctly.

mizoru · 2024-07-31T16:51:34Z

correctly configuring the bnb_4bit_quant_storage to a float dtype (as FSDP doesn't support int dtypes) and the dtype should exactly match the rest of the model tensors, enabling the FSDP wrapping logic to work correctly.

Am I not supposed to get this error with bnb_4bit_compute_dtype='bfloat16', bnb_4bit_quant_storage_dtype='bfloat16', bnb_4bit_quant_type='nf4'? It's still happening to me.

This was referenced Feb 27, 2024

load_in_4bit still not failed with fsdp in latest source build bitsandbytes-foundation/bitsandbytes#1092

Closed

load_in_4bit still not failed with fsdp in latest bitsandbyte huggingface/trl#1375

Closed

github-actions bot closed this as completed Mar 13, 2024

younesbelkada reopened this Mar 13, 2024

github-actions bot closed this as completed Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Cannot flatten integer dtype tensors #1432

ValueError: Cannot flatten integer dtype tensors #1432

fancyerii commented Feb 4, 2024

younesbelkada commented Feb 4, 2024

fancyerii commented Feb 5, 2024

fancyerii commented Feb 5, 2024

fancyerii commented Feb 5, 2024

github-actions bot commented Mar 5, 2024

younesbelkada commented Mar 13, 2024

shatealaboxiaowang commented Mar 15, 2024

fancyerii commented Mar 15, 2024

fancyerii commented Mar 15, 2024

shatealaboxiaowang commented Mar 15, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024

iMTimmyyy commented Apr 24, 2024

Titus-von-Koeller commented Jul 23, 2024

mizoru commented Jul 31, 2024

ValueError: Cannot flatten integer dtype tensors #1432

ValueError: Cannot flatten integer dtype tensors #1432

Comments

fancyerii commented Feb 4, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Feb 4, 2024

fancyerii commented Feb 5, 2024

fancyerii commented Feb 5, 2024

fancyerii commented Feb 5, 2024

github-actions bot commented Mar 5, 2024

younesbelkada commented Mar 13, 2024

shatealaboxiaowang commented Mar 15, 2024

fancyerii commented Mar 15, 2024

fancyerii commented Mar 15, 2024

shatealaboxiaowang commented Mar 15, 2024 • edited Loading

github-actions bot commented Apr 8, 2024

iMTimmyyy commented Apr 24, 2024

Titus-von-Koeller commented Jul 23, 2024

mizoru commented Jul 31, 2024

shatealaboxiaowang commented Mar 15, 2024 •

edited

Loading