Add oneAPI device selector for xpu and some other changes. #6112

simonlui · 2024-12-19T10:45:35Z

I expect that there might be some opinions about 1.) and 2.) so I am open for anyone arguing for some detail changes or another way to implement if needs be. List of changes here include:

1.) Add in a --oneapi-device-selector that does something similar to --cuda-device but for Intel oneAPI devices. This doesn't need to necessarily be limited to GPUs but I expect for the time being that it will effectively only do that. Documentation on how to use this can be found at https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#oneapi_device_selector

2.) Per https://github.com/pytorch/pytorch/blob/v2.5.0/docs/source/notes/numerical_accuracy.rst#reduced-precision-reduction-for-fp16-and-bf16-in-scaled-dot-product-attention-sdpa of which pytorch/pytorch#135778 brought this up to my attention, the default behavior using SDPA was changed in Pytorch 2.5 to upcast by default to avoid numerical errors. Since the old behavior has been working fine with ComfyUI, set torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp to default to true for ComfyUI if Pytorch 2.5 or up is detected.

3.) Documentation changes for IPEX and noting one can install the mainline builds of Pytorch to get ComfyUI working on it with the caveat that most optimizations aren't there yet. It's if anything still a beta release.
Defer to #6069 for documentation changes.

comfyanonymous · 2024-12-20T23:09:36Z

comfy/model_management.py

 if ENABLE_PYTORCH_ATTENTION:
    torch.backends.cuda.enable_math_sdp(True)
    torch.backends.cuda.enable_flash_sdp(True)
    torch.backends.cuda.enable_mem_efficient_sdp(True)

+if int(torch_version[0]) == 2 and int(torch_version[2]) >= 5:
+    torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)


There are no situations where the math backend is actually used by ComfyUI unless you force it.

I should've explained this better. For any non-Nvidia GPUs, the math backend is what ends up being used if Pytorch Attention is selected since the Flash Attention and mem efficient backends are CUDA only, with AMD's implementations for both only making it around 1-2 weeks ago in the nightly packages. I can try and gate this off a bit better if you want, but the Pytorch change mentioned does slow things down for GPUs that are stuck in that situation like Intel right now.

simonlui added 2 commits December 19, 2024 01:28

Add oneAPI device selector and some other minor changes.

afde678

Fix device selector variable name.

fdccaa1

simonlui requested review from yoland68, robinjhuang, huchenlei, webfiltered, pythongosssss, ltdrdata, Kosinkadink and comfyanonymous as code owners December 19, 2024 10:45

simonlui added 2 commits December 19, 2024 10:34

Flip minor version check sign.

b588fdd

Undo changes to README.md.

895e78e

comfyanonymous reviewed Dec 20, 2024

View reviewed changes

comfyanonymous merged commit c6b9c11 into comfyanonymous:master Dec 23, 2024
5 checks passed

simonlui deleted the add_xpu_device branch December 23, 2024 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oneAPI device selector for xpu and some other changes. #6112

Add oneAPI device selector for xpu and some other changes. #6112

simonlui commented Dec 19, 2024 •

edited

Loading

comfyanonymous Dec 20, 2024

simonlui Dec 21, 2024

Add oneAPI device selector for xpu and some other changes. #6112

Add oneAPI device selector for xpu and some other changes. #6112

Conversation

simonlui commented Dec 19, 2024 • edited Loading

comfyanonymous Dec 20, 2024

Choose a reason for hiding this comment

simonlui Dec 21, 2024

Choose a reason for hiding this comment

simonlui commented Dec 19, 2024 •

edited

Loading