Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oneAPI device selector for xpu and some other changes. #6112

Merged
merged 4 commits into from
Dec 23, 2024

Conversation

simonlui
Copy link
Contributor

@simonlui simonlui commented Dec 19, 2024

I expect that there might be some opinions about 1.) and 2.) so I am open for anyone arguing for some detail changes or another way to implement if needs be. List of changes here include:

1.) Add in a --oneapi-device-selector that does something similar to --cuda-device but for Intel oneAPI devices. This doesn't need to necessarily be limited to GPUs but I expect for the time being that it will effectively only do that. Documentation on how to use this can be found at https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#oneapi_device_selector

2.) Per https://github.com/pytorch/pytorch/blob/v2.5.0/docs/source/notes/numerical_accuracy.rst#reduced-precision-reduction-for-fp16-and-bf16-in-scaled-dot-product-attention-sdpa of which pytorch/pytorch#135778 brought this up to my attention, the default behavior using SDPA was changed in Pytorch 2.5 to upcast by default to avoid numerical errors. Since the old behavior has been working fine with ComfyUI, set torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp to default to true for ComfyUI if Pytorch 2.5 or up is detected.

3.) Documentation changes for IPEX and noting one can install the mainline builds of Pytorch to get ComfyUI working on it with the caveat that most optimizations aren't there yet. It's if anything still a beta release.
Defer to #6069 for documentation changes.

if ENABLE_PYTORCH_ATTENTION:
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_flash_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(True)

if int(torch_version[0]) == 2 and int(torch_version[2]) >= 5:
torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no situations where the math backend is actually used by ComfyUI unless you force it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've explained this better. For any non-Nvidia GPUs, the math backend is what ends up being used if Pytorch Attention is selected since the Flash Attention and mem efficient backends are CUDA only, with AMD's implementations for both only making it around 1-2 weeks ago in the nightly packages. I can try and gate this off a bit better if you want, but the Pytorch change mentioned does slow things down for GPUs that are stuck in that situation like Intel right now.

@comfyanonymous comfyanonymous merged commit c6b9c11 into comfyanonymous:master Dec 23, 2024
5 checks passed
@simonlui simonlui deleted the add_xpu_device branch December 23, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants