Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8-bit precision not working on Windows #20

Closed
minipasila opened this issue Jan 23, 2023 · 26 comments
Closed

8-bit precision not working on Windows #20

minipasila opened this issue Jan 23, 2023 · 26 comments
Labels
bug Something isn't working

Comments

@minipasila
Copy link
Contributor

It seems like it doesn't like to work on Windows and is unable to detect my cuda installation.

(textgen) C:\Users\pasil\text-generation-webui>python server.py --cai-chat --load-in-8bit
Warning: chat mode currently becomes a lot slower with text streaming on.
Consider starting the web UI with the --no-stream option.

Loading pygmalion-6b_dev...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
  warn(msg)
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: C:\Users\pasil\anaconda3\envs\textgen did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
  File "C:\Users\pasil\text-generation-webui\server.py", line 235, in <module>
    model, tokenizer = load_model(model_name)
  File "C:\Users\pasil\text-generation-webui\server.py", line 109, in load_model
    model = eval(command)
  File "<string>", line 1, in <module>
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 463, in from_pretrained
    return model_class.from_pretrained(
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2279, in from_pretrained
    from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\__init__.py", line 7, in <module>
    from .autograd._functions import (
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\autograd\_functions.py", line 8, in <module>
    import bitsandbytes.functional as F
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\functional.py", line 17, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cextension.py", line 22, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
        If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
        https://github.com/TimDettmers/bitsandbytes/issues
@Silver267
Copy link
Contributor

bitsandbytes currently does not support windows, but there are some workarounds.
This is one of them: bitsandbytes-foundation/bitsandbytes#30

@minipasila
Copy link
Contributor Author

Thanks, I managed to somehow make it work.

@VenkatLohithDasari
Copy link

Thanks, I managed to somehow make it work.

How did you manage to make it work? can you share the method?

@oobabooga oobabooga reopened this Jan 30, 2023
@minipasila
Copy link
Contributor Author

Basically you have to download these 2 dll files from here. then you move those files into anaconda3\env\textgen\Lib\site-packages\bitsandbytes (assuming you're using conda) after that you have to edit one file in anaconda3\env\textgen\Lib\site-packages\bitsandbytes\cuda_setup edit the main.py with these:
Change ct.cdll.LoadLibrary(binary_path) to ct.cdll.LoadLibrary(str(binary_path)) two times in the file.
Then replace
if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None
with
if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None
After that it should let you load the models using 8 bit precision.

@lolxdmainkaisemaanlu
Copy link

lolxdmainkaisemaanlu commented Feb 1, 2023

EDIT: I celebrated too early, it gives me a cublast error on trying to generate lol

@minipasila THANKYOU SO MUCH! Your instructions + prebuilt bitandbytes for older GPUs https://github.com/james-things/bitsandbytes-prebuilt-all_arch is helping me run Pygmalion 2.7B on my GTX 1060 6GB and it's taking only 3.8 GB VRAM ( out of which prolly 0.4 is being used by the system as I don't have inbuilt graphics )

@ghost
Copy link

ghost commented Feb 1, 2023

@minipasila Thank you for sharing, your instructions worked perfectly for GPT-J-6B on 3070ti

@Slug-Cat
Copy link

Slug-Cat commented Feb 19, 2023

For future reference, the 8 bit windows fix required me to navigate to my Python310 install folder instead of the env, as bitsandbytes was not installed in the conda env.

@VertexMachine
Copy link

For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch
Using v37 did it for me finally :)

@iChristGit
Copy link

For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch Using v37 did it for me finally :)

For future reference, the 8 bit windows fix required me to navigate to my Python310 install folder instead of the env, as bitsandbytes was not installed in the conda env.

I still have the same issue I tried everything linked except the v37 fix, I downloaded the dll and put it in the bitsandbytes folder, what next?

@VertexMachine
Copy link

I still have the same issue I tried everything linked except the v37 fix, I downloaded the dll and put it in the bitsandbytes folder, what next?

Just change libbitsandbytes_cuda116.dll to libbitsandbytes_cudaall.dll in anaconda3\env\textgen\Lib\site-packages\bitsandbytes\cuda_setup\main.py

@patrickmros
Copy link

patrickmros commented Mar 12, 2023

I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image
python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."

EDIT:

Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.

@MikkoHaavisto
Copy link

I got the same error before.
Now I copied the cudaall.dll from stable diffusion bitsandbytes folder instead of cuda116.dll.
It started, even when nothing else has worked!

@BugMonkey42335
Copy link

When attempting to generate with 8-bit using the new libraries suggested by VertexMachine, I get this error.

#20 (comment)

C:\oobabooga\installer_files\env\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py:195: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorCompare.cpp:413.) attn_weights = torch.where(causal_mask, attn_weights, mask_value)

@kickook
Copy link

kickook commented Mar 15, 2023

Loading llama-7b-hf...
Traceback (most recent call last):
File "E:\anaen\textgen\text-generation-webui\server.py", line 191, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "E:\anaen\textgen\text-generation-webui\modules\models.py", line 130, in load_model
model = eval(command)
File "", line 1, in
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 434, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\configuration_auto.py", line 873, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\configuration_auto.py", line 579, in getitem
raise KeyError(key)
KeyError: 'llama'
how to fix it?

@not6sleepy
Copy link

I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."

EDIT:

Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.

How do you find the stable diffusion bitsandbytes folder?

@patrickmros
Copy link

I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."
EDIT:
Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.

How do you find the stable diffusion bitsandbytes folder?

It's in this folder: stable-diffusion-webui\venv\Lib\site-packages\bitsandbytes

@oobabooga
Copy link
Owner

oobabooga commented Mar 24, 2023

See

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#step-1-installation

and

#530

@oobabooga
Copy link
Owner

8-bit should work out of the box with the new one-click installer

https://github.com/oobabooga/text-generation-webui#one-click-installers

@Setmaster
Copy link

Please review response post by @PhyX-Meow re: yuk7/ArchWSL#248. As he points out, it really has nothing to do with your Linux install. It's a simple fix in Windows. The solution he posts is for Arch, but the fix is exactly same for Ubuntu, etc. in a WSL2 install. The issue is that Windows delivers libcuda.so, libcuda.so.1, and libcuda.so.1.1 as fully separate copies of the same file. The fix is just to remove libcuda.so and libcuda.so.1, and just make sym links for each of them to libcuda.so.1.1

Run a command line shell as Administrator, type "cmd" to get a non-powershell command line.

Then type the following commands to create the problematic symbolic links:

C: cd \Windows\System32\lxss\lib del libcuda.so del libcuda.so.1 mklink libcuda.so libcuda.so.1.1 mklink libcuda.so.1 libcuda.so.1.1

when you're done, it will look like this:

**C:\Windows\System32\lxss\lib> DIR ... ... Directory of C:\Windows\System32\lxss\lib 03/15/2022 04:00 PM **

** . 03/15/2022 03:59 PM libcuda.so [libcuda.so.1.1] 03/15/2022 04:00 PM libcuda.so.1 [libcuda.so.1.1]**
Then, just finish your command you were running, in my case, the solution was just run "apt reinstall libc-bin". This is because libc-bin was getting the errors when I had run "apt upgrade -y" command. (see below)

The error I received in my "apt upgrade -y" command was two lines: #> apt upgrade -y .... < stuff deleted > ... Processing triggers for libc-bin (2.31-0ubuntu9.7) ... /sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link ... < stuff deleted > ...

As per @PhyX-Meow

"Actually this is not relate to Arch, nor ArchWSL. It's caused by libcuda.so in your C:\Windows\System32\lxss\lib\ folder not a symbolic link, which is installed by nvidia driver. One solution to [remove] the warning is delete libcuda.so and libcuda.so.1 and use make symbolic link to libcuda.so.1.1. Command line: mklink . Note the command not work in powershell, you shall use cmd.exe."

:)

This seems to be solving the issue for me, still working on it.

Ph0rk0z referenced this issue in Ph0rk0z/text-generation-webui-testing Apr 17, 2023
@lugangqi
Copy link

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 16.00 GiB of which 0 bytes is free. Of the allocated memory 15.06 GiB is allocated by PyTorch, and 54.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@minipasila
Copy link
Contributor Author

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 16.00 GiB of which 0 bytes is free. Of the allocated memory 15.06 GiB is allocated by PyTorch, and 54.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

It means you ran out of memory (try 4bit precision or another quant method like GPTQ/EXL2/GGUF or smaller model) but this error is unrelated to this issue. (sorry for another notification)

@lugangqi
Copy link

Is it not compatible? Even slower is faster than llama...

@lugangqi
Copy link

RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

@minipasila
Copy link
Contributor Author

RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

Ok this is a different error.. so uhh you probably should give a bit more information about your issue like how you installed textgen and what your setup is etc.. potentially just make a new issue because this is probably a different issue than what I made this for originally.

@lugangqi
Copy link

exllamav2 Directly loaded, it is also cuda12.4 but it does not support the m40GPU Max architecture of computing Power 5.2

@lugangqi
Copy link

在本地 URL 上运行:http://127.0.0.1:7860/

01:55:11-441029 INFO 加载“14b-exl”
01:55:12-675028 错误 无法加载模型。
回溯(最近一次调用):
文件“D:\text-generation-webui\modules\ui_model_menu.py”,第 249 行,在
shared.model load_model_wrapper,shared.tokenizer = load_model(selected_model,加载器)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 94 行,在 load_model
输出 = load_func_map加载器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 366 行,在 modules.exllamav2 导入的ExLlamav2_loader
中 Exllamav2Model
文件“D:\text-generation-webui\modules\exllamav2.py”,第 5 行,从
exllamav2 导入 (
文件“D:\text-generation-webui\installer_files env\Lib\site-packages\exllamav2_init_.py”,第 3 行,从
exllamav2.model 导入 ExLlamaV2
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 25 行,从
exllamav2.linear 导入 ExLlamaV2Linear
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py”,第 7 行,从
exllamav2.module 导入 ExLlamaV2Module
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 14 行,在
os.environ[“CUDA_LAUNCH_BLOCKING”] = “1”
^^
NameError:未定义名称“os”

01:55:54-858096 INFO 加载“14b-exl”
01:55:56-017617 错误 无法加载模型。
回溯(最近一次调用):
文件“D:\text-generation-webui\modules\ui_model_menu.py”,第 249 行,在
shared.model load_model_wrapper,shared.tokenizer = load_model(selected_model,加载器)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 94 行,在 load_model
输出 = load_func_map加载器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 368 行,在 ExLlamav2_loader
模型中,分词器 = Exllamav2Model.from_pretrained(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\exllamav2.py”,第 60 行,在
model.load(split) from_pretrained
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 333 行,加载
中对于 f 中的项目:x = item
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 356 行,在
module.load()
load_gen 文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py”,第 255 行,在 load
self.k_proj.load()
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py”,第 92 行,如果 w 为 None,则在加载
中: w = self.load_weight()
^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 110 行,在 load_weight
qtensors = self.load_multi(key, [“q_weight”, “q_invperm”, “q_scale”, “q_scale_max”, “q_groups”, “q_perm”, “bias”])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 90 行,load_multi
张量[k] = stfile.get_tensor(key + “.” + k, device = self.device())

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py”,第 204 行,在 get_tensor
张量 = f.get_tensor(key)
^^^^^^^^^^^^^^^^^
RuntimeError:CUDA 错误:没有内核映像可用于在设备
编译方式上执行以启用设备端断言。TORCH_USE_CUDA_DSA

Touch-Night pushed a commit to Touch-Night/text-generation-webui that referenced this issue Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

16 participants