-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Cannot flatten integer dtype tensors #1432
Comments
Hi @fancyerii |
@younesbelkada it seems my g++ version do not support c++20. What's the recommended gcc version? /usr/local/cuda-12.1/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -Xcompiler '-fPIC' -dlink /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o -o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++20 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-12.1/include -I /nas/lili/codes/pt/ft/bitsandbytes/csrc -I /home/ubuntu/miniconda3/include -I /nas/lili/codes/pt/ft/bitsandbytes/include /nas/lili/codes/pt/ft/bitsandbytes/build/ops.o /nas/lili/codes/pt/ft/bitsandbytes/build/kernels.o /nas/lili/codes/pt/ft/bitsandbytes/build/link.o /nas/lili/codes/pt/ft/bitsandbytes/csrc/common.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/cpu_ops.cpp /nas/lili/codes/pt/ft/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda121.so -L /usr/local/cuda-12.1/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /home/ubuntu/miniconda3/lib
g++: error: unrecognized command line option ‘-std=c++20’; did you mean ‘-std=c++03’? |
I installed gcc/g++ 8.4. it still can't support -std=c++20:
I searched this post but still don't know which version of gcc/g++ can compile peft. |
I installed gcc/g++-10 and build bitsandbytes successfully. Now I don't confront ValueError: Cannot flatten integer dtype tensors. But it still oom. So I enable gradient_checkpointing and it run correctly. it seems load in 4bit don't save that much gpu memory? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
FSDP + QLoRA should be now fully integrated in HF ecosystem |
I tried, but the error still same, |
great job. I can run it with 8 gpus in one node. later I will try with two nodes. |
here is my installation(I use PDM and this is exported requirements.txt) |
Thank you very much for sharing. My pip installed version is exactly the same as yours accelerate & bitsandbytes、transformers、trl and peft from source. but the same error. maybe I launch training by accelerate ? I don't know why. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@shatealaboxiaowang |
Hey, the bitsandbytes lead maintainer here 🤗 For anyone still hitting this issue due to FSDP + QLoRA, please make sure to read these docs about correctly configuring the |
Am I not supposed to get this error with bnb_4bit_compute_dtype='bfloat16', bnb_4bit_quant_storage_dtype='bfloat16', bnb_4bit_quant_type='nf4'? It's still happening to me. |
System Info
peft 0.8.2
Python 3.9.18
accelerate 0.26.1
torch 2.2.0
transformers 4.37.2
Who can help?
No response
Information
Tasks
examples
folderReproduction
I am following Fine-tuning with 🤗 TRL and run with:
It throws:
Besides this error, there are warning like:
my env:
I have added
tokenizer.padding_side = 'right'
. And it throws:And I found this meta-llama/llama-recipes#240. It says fsdp can't use load_in_4bit. When I turn of load_in_4bit, it works.(It has another oom problem, you can see here but it's not related to peft)
Expected behavior
fix bug
The text was updated successfully, but these errors were encountered: