-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's test the wheels #3
Comments
@FurkanGozukara @shivshankar11 @osadchi @WingeD123 @frankyifei @jepjoo Please let me know if it works on your machine. You can use the simple test script here: |
Awesome I saw. they are very small compared to older ones any particular reason? I think I may test on CogVLM v2 what you think? |
I excluded some binaries unnecessary for end users. |
wow, thanks. |
Yes, you can read the README again for more detailed instructions |
@woct0rdho test py failed for me can't we make it include c++? it is super annoying and hard for people to install I have got Build Tools version LTSC 17.8 and tested with Pytorch 2.4.1 and CUDA 12.4 and cudnn 8.9.7 here my c++ tools and SDKS |
even after setting all path and requirement triton compilation fail. |
@FurkanGozukara Please modify @@ -43,6 +43,11 @@
# try to avoid setuptools if possible
cc = os.environ.get("CC")
if cc is None:
+ if os.name == "nt":
+ msvc_winsdk_inc_dirs, _ = find_msvc_winsdk()
+ if msvc_winsdk_inc_dirs:
+ cl_path = msvc_winsdk_inc_dirs[0].replace(r"\include", r"\bin\Hostx64\x64")
+ os.environ["PATH"] = cl_path + os.pathsep + os.environ["PATH"]
# TODO: support more things here.
cl = shutil.which("cl")
gcc = shutil.which("gcc") |
wait i used pre compiled wheel didnt compile on my system |
Modify this file in |
@woct0rdho new error
|
I see, you have Windows SDK in your "Visual Studio Build Tools", but not "Visual Studio Community". The easiest way is to also install Windows SDK in your "Visual Studio Community", and I suggest completely uninstall your "Visual Studio Build Tools" |
got error: |
dmn i didnt show this on video tutorial. we can't fix it without install? Visual Studio Build Tools is being required to compile so many stuff installing SDK right now to test |
@WingeD123 Please run the simple test script here: |
@FurkanGozukara Modify @@ -33,6 +33,8 @@
"*",
"-requires",
"Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
+ "-requires",
+ "Microsoft.VisualStudio.Component.Windows10SDK",
"-latest",
"-property",
"installationPath", |
@woct0rdho installed SDK of win11 and made that change didnt fix now installing windows 10 sdk too |
subprocess.CalledProcessError: Command '['C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\cl.EXE', 'C:\Users\PLAY\AppData\Local\Temp\tmp525mv19u\main.c', '/nologo', '/O2', '/LD', '/wd4819', '/ID:\sd-ComfyUI\python_embeded\Lib\site-packages\triton\backends\nvidia\include', '/IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include', '/IC:\Users\PLAY\AppData\Local\Temp\tmp525mv19u', '/ID:\sd-ComfyUI\python_embeded\Include', '/IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include', '/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\shared', '/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\ucrt', '/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\um', '/link', '/LIBPATH:D:\sd-ComfyUI\python_embeded\Lib\site-packages\triton\backends\nvidia\lib', '/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64', '/LIBPATH:D:\sd-ComfyUI\python_embeded\libs', '/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\lib\x64', '/LIBPATH:C:\Program Files (x86)\Windows Kits\10\Lib\10.0.20348.0\ucrt\x64', '/LIBPATH:C:\Program Files (x86)\Windows Kits\10\Lib\10.0.20348.0\um\x64', 'cuda.lib', '/OUT:C:\Users\PLAY\AppData\Local\Temp\tmp525mv19u\cuda_utils.cp311-win_amd64.pyd']' returned non-zero exit status 2. |
@woct0rdho same error - i hate c++ tools - i only restarted the CMD not computer
|
@FurkanGozukara Please run this in the same Python venv, and show the results: import sysconfig
print(sysconfig.get_paths()) |
by the way with my current system compiling InsightFace and XPose works
|
let me test on c drive directly |
onediff>>> import sysconfig
|
@FurkanGozukara Do you have these two folders: Note that it's 'libs', not 'lib' |
@woct0rdho only lib is there python folder has libs |
@FurkanGozukara Please modify @@ -63,6 +68,9 @@
include_dirs = include_dirs + [srcdir, py_include_dir]
if os.name == "nt":
library_dirs += [os.path.join(sysconfig.get_paths()["data"], "libs")]
+ library_dirs += [os.path.join(os.path.dirname(sys.executable), "libs")]
+ python_version = sysconfig.get_python_version().replace(".", "")
+ library_dirs += [fr"C:\Python{python_version}\libs"]
msvc_winsdk_inc_dirs, msvc_winsdk_lib_dirs = find_msvc_winsdk()
include_dirs += msvc_winsdk_inc_dirs
library_dirs += msvc_winsdk_lib_dirs |
|
Great, the test script passed for you. You can try to do more things in ComfyUI Now I'll make new wheels and let others try them |
yes but it is hard coded :D i am waiting new wheel to test. i need this for general public users that follows me |
doesn't change anything about torch.compile, right? what is the point of having whl for windows ? |
@bghira Triton is the default backend of Actually it's possible to make Triton work on Windows, and what I'm doing here is to publish wheels and make it easier for more people to use it |
@NeoAnthropocene In your case you can successfully import If you'd like you can help more with debugging. Install import torch
import triton
import triton.language as tl
@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
y = tl.load(y_ptr + offsets, mask=mask)
output = x + y
tl.store(output_ptr + offsets, output, mask=mask)
def add(x: torch.Tensor, y: torch.Tensor):
output = torch.empty_like(x)
assert x.is_cuda and y.is_cuda and output.is_cuda
n_elements = output.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
return output
import sys
import dlltracer
with dlltracer.Trace(out=sys.stdout):
a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b) |
If you see The embeded Python already bundles |
@woct0rdho You're genius! Smashed the bug 🪲 Walkthrough to the debugging:
That worked 😄
|
I am getting quite curious error when trying to run anything to do with torch.compile using comfyUI, after getting VS and CUDA toolkit all set up, putting libs, include folders and installing triton wheels; backend inductor tries to access cache that already exists (FileExistsError: WinError 183) in the temp folder within AppData/Local. |
thanks, worked |
@Phosay Please paste the whole error log, not only the last line |
This sounds like the error I'm getting when using torch.compile with Flux and CFG higher than 1. CFG = 1.0 works fine. I think with CFG 1< it tries to do the torch.compile twice and because the file already exists for the first one, it fails on the second one. On WSL, CFG 1< required the second compile aswell, but it did still work. I don't have the error for this saved, can get back to this tomorrow if @Phosay doesn't do it first (assuming this is the same issue). |
From my test, it give me 10% speed increase on my 3060 |
What tasks? ComfyUI or Forge? i ran test.py and it works directly from Forge venv, but in generating images there is no difference |
@Phosay You need to modify the file |
Flux CFG at 1.0 works fine, Flux CFG higher than 1, I get this:
got prompt
0%| | 0/30 [00:14 'C:\\Users\\USERNAME\\AppData\\Local\\Temp\\torchinductor_USERNAME\\cache\\84b8dc1bae2f40b2751b45cbdfa5721ab7a12992b441e7590a3e9f612e421f8a'
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: Prompt executed in 14.89 seconds |
@jepjoo You also need to modify this file, see pytorch/pytorch#138211 |
That fixed it thanks! |
ca you use batch size 2 with torch.compile? i get same error if batch size is not 1. |
Just tried it and yes, batch size 2 works with torch.compile for me. Try the fix above. It's quite simple. |
i am using 4090, after fix batch 2 is crashing comfyui |
Great work, any chance for a python3.9 wheel? |
@NikosKont Sure, now the wheels for Python 3.8 and 3.9 are published. I did not fully test them, but they should work because the official Triton publishes wheels for Python 3.8 to 3.12 |
I tested these Triton wheels to run PyTorch FlexAttention on Windows 10. It seems to work, thanks for your work! Sharing a note: while trying to compile a |
Also faced with this, will hope for a wheel package update soon |
Has anyone encountered the problem of colored cubes in CogVideoX? |
I shake your hand. |
Hi everyone, I don't understand what the error is. I tried to edit the Нажмите, чтобы раскрыть текст# ComfyUI Error Report ## Error Details - **Node Type:** KSampler - **Exception Type:** torch._dynamo.exc.BackendCompilerFailed - **Exception Message:** backend='inductor' raised: CompilationError: at 16:15: xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = tl.full([XBLOCK, RBLOCK], True, tl.int1) rbase = tl.arange(0, RBLOCK)[None, :] x0 = xindex _tmp11 = tl.full([XBLOCK, RBLOCK], 0, tl.float32) _tmp18 = tl.full([XBLOCK, RBLOCK], 0, tl.float32) for roffset in range(0, rnumel, RBLOCK): rindex = roffset + rbase rmask = rindex < rnumel r1 = rindex tmp0 = tl.load(in_ptr0 + (r1), rmask, eviction_policy='evict_last', other=0.0).to(tl.float32) tmp6 = tl.load(in_ptr1 + (r1 + (3072*x0)), rmask, eviction_policy='evict_first', other=0.0) ^Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: Stack Trace
System Information
Devices
Logs
Attached WorkflowPlease make sure that workflow does not contain any sensitive information such as API keys or passwords.
Additional Context(Please add any additional context or steps to reproduce the error here) |
@Danteday It says |
It's a pity, I thought everything would work on my 3090 :C |
If you didn't, first read the instructions for installation here: https://github.com/woct0rdho/triton-windows#install-from-wheel
When you see errors, paste the whole error log, not only the last line
The text was updated successfully, but these errors were encountered: