Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Cannot flatten integer dtype tensors #1432

Closed
2 of 4 tasks
fancyerii opened this issue Feb 4, 2024 · 14 comments
Closed
2 of 4 tasks

ValueError: Cannot flatten integer dtype tensors #1432

fancyerii opened this issue Feb 4, 2024 · 14 comments

Comments

@fancyerii
Copy link

System Info

peft 0.8.2
Python 3.9.18
accelerate 0.26.1
torch 2.2.0
transformers 4.37.2

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I am following Fine-tuning with 🤗 TRL and run with:

accelerate launch --config_file examples/accelerate_configs/multi_gpu.yaml --num_processes=4 \
    examples/scripts/sft.py \
    --model_name /nas/lili/models_hf/Mixtral-8x7B-Instruct-v0.1 \
    --dataset_name trl-lib/ultrachat_200k_chatml \
    --batch_size 2 \
    --gradient_accumulation_steps 1 \
    --learning_rate 2e-4 \
    --save_steps 200_000 \
    --use_peft \
    --peft_lora_r 16 --peft_lora_alpha 32 \
    --target_modules q_proj k_proj v_proj o_proj \
    --load_in_4bit \
    --output output \
    --use_auth_token false

It throws:

Traceback (most recent call last):
  File "/nas/lili/codes/pt/ft/trl/examples/scripts/sft.py", line 158, in <module>
    trainer.train()
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 315, in train
    output = super().train(*args, **kwargs)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1821, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/accelerate/data_loader.py", line 448, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 45, in __call__  
    return self.torch_call(features)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 732, in torch_call
    batch = self.tokenizer.pad(examples, return_tensors="pt", pad_to_multiple_of=self.pad_to_multiple_of)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3259, in pad
    padding_strategy, _, max_length, _ = self._get_padding_truncation_strategies(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2707, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

Besides this error, there are warning like:

/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:282: UserWarning: You passed a token
izer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when tra
ining a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.

my env:

torch                    2.1.2
transformers             4.36.2
trl                      0.7.8.dev0
accelerate               0.25.0
peft                     0.7.1
bitsandbytes             0.41.3.post2

I have added tokenizer.padding_side = 'right'. And it throws:

Traceback (most recent call last):
  File "/nas/lili/codes/pt/ft/trl/examples/scripts/sft.py", line 117, in <module>
    trainer.train()
  File "/nas/lili/codes/pt/ft/trl/trl/trainer/sft_trainer.py", line 326, in train
    output = super().train(*args, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop
    model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1227, in prepare
    result = tuple(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1228, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1104, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1386, in prepare_model
    model = FSDP(model, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 477, in __init__
    _auto_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap
    _recursive_wrap(**recursive_wrap_kwargs, **root_kwargs)  # type: ignore[arg-type]
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
    wrapped_child, num_wrapped_params = _recursive_wrap(
  [Previous line repeated 2 more times]
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 561, in _recursive_wrap
    return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/wrap.py", line 490, in _wrap
    return wrapper_cls(module, **kwargs)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 503, in __init__
    _init_param_handle_from_module(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_init_utils.py", line 594, in _init_param_handle_from_module
    _init_param_handle_from_params(state, managed_params, fully_sharded_module)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_init_utils.py", line 606, in _init_param_handle_from_params
    handle = FlatParamHandle(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 573, in __init__
    self._init_flat_param_and_metadata(
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 623, in _init_flat_param_and_metadata
    ) = self._validate_tensors_to_flatten(params)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/ft-zSqjAXBp-py3.9/lib/python3.9/site-packages/torch/distributed/fsdp/_flat_param.py", line 759, in _validate_tensors_to_flatten
    raise ValueError("Cannot flatten integer dtype tensors")

And I found this meta-llama/llama-recipes#240. It says fsdp can't use load_in_4bit. When I turn of load_in_4bit, it works.(It has another oom problem, you can see here but it's not related to peft)

Expected behavior

fix bug

@younesbelkada
Copy link
Contributor

Hi @fancyerii
Thanks a lot for the issue !
For the first issue you need to set tokenizer.pad_token = tokenizer.eos_token
Indeed FSDP + QLoRA are not compatible - yet ! @Titus-von-Koeller @pacman100 & external collaborators (@warner-benjamin @weifengpy) worked recently on adding QLoRA + FSDP support: bitsandbytes-foundation/bitsandbytes#970 can you try out to build bnb from source? (installation guideleines here: https://huggingface.co/docs/bitsandbytes/main/en/installation) - let us know how it goes !

@fancyerii
Copy link
Author

@younesbelkada it seems my g++ version do not support c++20. What's the recommended gcc version?

/usr/local/cuda-12.1/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -Xcompiler '-fPIC' -dlink /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o -o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++20 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-12.1/include -I /nas/lili/codes/pt/ft/bitsandbytes/csrc -I /home/ubuntu/miniconda3/include -I /nas/lili/codes/pt/ft/bitsandbytes/include /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o /nas/lili/codes/pt/ft/bitsandbytes/csrc/common.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/cpu_ops.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda121.so -L /usr/local/cuda-12.1/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /home/ubuntu/miniconda3/lib
g++: error: unrecognized command line option ‘-std=c++20’; did you mean ‘-std=c++03’?

@fancyerii
Copy link
Author

I installed gcc/g++ 8.4. it still can't support -std=c++20:

/usr/local/cuda-12.1/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -Xcompiler '-fPIC' -dlink /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o -o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++20 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-12.1/include -I /nas/lili/codes/pt/ft/bitsandbytes/csrc -I /home/ubuntu/miniconda3/include -I /nas/lili/codes/pt/ft/bitsandbytes/include /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o /nas/lili/codes/pt/ft/bitsandbytes/csrc/common.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/cpu_ops.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda121.so -L /usr/local/cuda-12.1/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /home/ubuntu/miniconda3/lib
g++: error: unrecognized command line option ‘-std=c++20’; did you mean ‘-std=c++2a’?

I searched this post but still don't know which version of gcc/g++ can compile peft.

@fancyerii
Copy link
Author

I installed gcc/g++-10 and build bitsandbytes successfully. Now I don't confront ValueError: Cannot flatten integer dtype tensors. But it still oom. So I enable gradient_checkpointing and it run correctly. it seems load in 4bit don't save that much gpu memory?

Copy link

github-actions bot commented Mar 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@younesbelkada
Copy link
Contributor

FSDP + QLoRA should be now fully integrated in HF ecosystem
Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh
Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

@shatealaboxiaowang
Copy link

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

@fancyerii
Copy link
Author

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

great job. I can run it with 8 gpus in one node. later I will try with two nodes.

@fancyerii
Copy link
Author

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt)
requirements.txt

@shatealaboxiaowang
Copy link

shatealaboxiaowang commented Mar 15, 2024

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt) requirements.txt

Thank you very much for sharing. My pip installed version is exactly the same as yours accelerate & bitsandbytes、transformers、trl and peft from source. but the same error.

maybe I launch training by accelerate ? I don't know why.

Copy link

github-actions bot commented Apr 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@iMTimmyyy
Copy link

FSDP + QLoRA should be now fully integrated in HF ecosystem Can you try to install latest accelerate & bitsandbytes from pypi and transformers / trl / peft from source and follow the script presented here: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh Please also have a look at the appropriate PEFT documentation: https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed

I tried, but the error still same,

here is my installation(I use PDM and this is exported requirements.txt) requirements.txt

Thank you very much for sharing. My pip installed version is exactly the same as yours accelerate & bitsandbytes、transformers、trl and peft from source. but the same error.

maybe I launch training by accelerate ? I don't know why.

@shatealaboxiaowang
Hi, I am experiencing the same issue as you do. Have you managed to find a solution?

@Titus-von-Koeller
Copy link
Contributor

Hey, the bitsandbytes lead maintainer here 🤗

For anyone still hitting this issue due to FSDP + QLoRA, please make sure to read these docs about correctly configuring the bnb_4bit_quant_storage to a float dtype (as FSDP doesn't support int dtypes) and the dtype should exactly match the rest of the model tensors, enabling the FSDP wrapping logic to work correctly.

@mizoru
Copy link

mizoru commented Jul 31, 2024

correctly configuring the bnb_4bit_quant_storage to a float dtype (as FSDP doesn't support int dtypes) and the dtype should exactly match the rest of the model tensors, enabling the FSDP wrapping logic to work correctly.

Am I not supposed to get this error with bnb_4bit_compute_dtype='bfloat16', bnb_4bit_quant_storage_dtype='bfloat16', bnb_4bit_quant_type='nf4'? It's still happening to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants