Flex xpu bug fix #26135

abhilash1910 · 2023-09-13T09:06:41Z

What does this PR do?

For some Flex Intel XPUs there is support for fp16 mp ; hence this exception should not be raise if fp16 is provided as mp dtype .
cc @muellerzr @amyeroberts

amyeroberts

Thanks for fixing!

HuggingFaceDocBuilderDev · 2023-09-13T10:44:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

amyeroberts · 2023-09-13T10:56:36Z

@muellerzr Could you confirm if this fix is OK and if there's anywhere else that needs to be updated?

muellerzr · 2023-09-13T14:01:20Z

src/transformers/training_args.py

+            and (self.device.type != "xpu")
            and (get_xla_device_type(self.device) != "GPU")
            and (self.fp16 or self.fp16_full_eval)
        ):
            raise ValueError(
                "FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation"
-                " (`--fp16_full_eval`) can only be used on CUDA or NPU devices."
+                " (`--fp16_full_eval`) can only be used on CUDA or NPU devices or certain XPU devices (with IPEX)."


I do not see an equivalent of this in accelerate, and it sounds like we need this to be here. Specifically looking at this chunk of code: https://github.com/huggingface/accelerate/blob/main/src/accelerate/accelerator.py#L427-L428. Can we include a follow-up PR for this in accelerate? Otherwise the trainer section looks fine to me.

@abhilash1910 Would you like to open this PR in accelerate?

@muellerzr Do we need to coordinate these changes at all?

@amyeroberts no, it doesn't need a coordination in this case since training_args is the one raising the error, but it's something we want to make sure is a check in accelerate too.

OK. I'll merge this then 👍

Thanks @muellerzr , yes this is needed in accelerate as well. I will open the PR there.

Thanks @amyeroberts for suggestions.

flex gpu bug fix

flex gpu bug fix

15d07bd

amyeroberts approved these changes Sep 13, 2023

View reviewed changes

muellerzr approved these changes Sep 13, 2023

View reviewed changes

amyeroberts merged commit 05de038 into huggingface:main Sep 13, 2023

abhilash1910 mentioned this pull request Sep 14, 2023

Flex fix patch for accelerate huggingface/accelerate#1972

Merged

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

Flex xpu bug fix (huggingface#26135)

68bbbd9

flex gpu bug fix

blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023

Flex xpu bug fix (huggingface#26135)

19863c0

flex gpu bug fix

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023

Flex xpu bug fix (huggingface#26135)

5a7245b

flex gpu bug fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flex xpu bug fix #26135

Flex xpu bug fix #26135

abhilash1910 commented Sep 13, 2023

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Sep 13, 2023

amyeroberts commented Sep 13, 2023

muellerzr Sep 13, 2023

amyeroberts Sep 13, 2023

muellerzr Sep 13, 2023

amyeroberts Sep 13, 2023

abhilash1910 Sep 14, 2023

abhilash1910 Sep 14, 2023

Flex xpu bug fix #26135

Flex xpu bug fix #26135

Conversation

abhilash1910 commented Sep 13, 2023

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 13, 2023

amyeroberts commented Sep 13, 2023

muellerzr Sep 13, 2023

Choose a reason for hiding this comment

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

muellerzr Sep 13, 2023

Choose a reason for hiding this comment

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

abhilash1910 Sep 14, 2023

Choose a reason for hiding this comment

abhilash1910 Sep 14, 2023

Choose a reason for hiding this comment