-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webgl: Fix NaN issue #6828
webgl: Fix NaN issue #6828
Conversation
a18338d
to
8a6e23c
Compare
@vladmandic @shurshilov
The solution in this PR:
I think the second one should also be workable. But if not, I need to revert this change Use NAN instead of nanValue.xxx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qjia7 thank you for fixing the NaN check, seems there are still some discussion on the validity of the fix? can you add some comment on the snippets to explain why the check needs to be breaking up. thanks!
Reviewed 9 of 9 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 7 of 7 files at r4, 6 of 6 files at r5, 9 of 9 files at r6.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128)
9b90c79
to
402f76e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyu10055 Breaking up the checks won't resolve the problem. And our previous verified solution is a misleading, which used the same test url (which totally removing NAN checking) by accident. So it's reasonable that whether to break up the checks, the result is similar.
After further debugging, it shows that isnan_custom
doesn't work well on the problem gpu. But if we switch back to the shader builtin isnan
, everything works well. So the current fixing is to allow the user to specify which isnan
to use. Please take another look. Thanks.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, please hold on the review. I find another issue. Will ping you when it's ready.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ready for review. It turns out the previous issue is my test case problem.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128)
Kindly ping you guys. Also add @Linchenn in case you are in vacation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128, @Linchenn, and @qjia7)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
* doesn't have the builtin isnan. */ ENV.registerFlag('WEBGL_ISNAN_CUSTOM', () => false);
should this default to true to match the previous behavior?
tfjs-backend-webgl/src/glsl_version.ts
line 78 at r8 (raw file):
#define isnan(value) isnan_custom(value) ` : '';
The formatting seems to be weird, is it auto-formatted?
if previous behavior is proven to be broken on some gpus, why? afaik, only reason for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jiajia!
Reviewable status: 0 of 1 approvals obtained (waiting on @Linchenn, @pyu10055, and @qjia7)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
should this default to true to match the previous behavior?
If it's only useful for webgl2, should this flag be set to true if webgl1 is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @Linchenn, @pyu10055, and @vladmandic)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
Previously, lina128 (Na Li) wrote…
If it's only useful for webgl2, should this flag be set to true if webgl1 is true?
webgl1 is not controlled by this flag. Since webgl1 doesn't have a builtin isnan
, we always use a custom isnan
. For webgl2, previously, we use a customized isnan
, which is proved to be problem on some GPUs. So I add this flag to use builtin isnan
for webgl2 by default. Meanwhile, I keep the original customized isnan
for testing if necessary. I agree with @vladmandic's opinion. Maybe I can rename it to WEBGL2_ISNAN_CUSTOM
to more reflect the meaning.
tfjs-backend-webgl/src/glsl_version.ts
line 78 at r8 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
The formatting seems to be weird, is it auto-formatted?
Yes, it's auto-formatted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @Linchenn, @pyu10055, and @vladmandic)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
Previously, qjia7 (Jiajia Qin) wrote…
webgl1 is not controlled by this flag. Since webgl1 doesn't have a builtin
isnan
, we always use a customisnan
. For webgl2, previously, we use a customizedisnan
, which is proved to be problem on some GPUs. So I add this flag to use builtinisnan
for webgl2 by default. Meanwhile, I keep the original customizedisnan
for testing if necessary. I agree with @vladmandic's opinion. Maybe I can rename it toWEBGL2_ISNAN_CUSTOM
to more reflect the meaning.
Or I can totally remove the customized isnan for webgl2 since we have the builtin isnan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128, @Linchenn, @pyu10055, and @vladmandic)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
Previously, qjia7 (Jiajia Qin) wrote…
Or I can totally remove the customized isnan for webgl2 since we have the builtin isnan ?
got it, if we believe using build-in isnan will fix the error and not creating any new issues, we can remove this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128, @Linchenn, @pyu10055, and @vladmandic)
tfjs-backend-webgl/src/flags_webgl.ts
line 276 at r8 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
got it, if we believe using build-in isnan will fix the error and not creating any new issues, we can remove this flag.
Thanks Ping. Let's keep it for a while to see if any bug is reported by the builtin isnan
. If everything works well, I will submit another PR to totally remove the customized one.
@qjia7 It sounds the correctness of the builtin/polyfilled |
In fact, It's hard to construct a small case to reflect the |
* Customize setTimeout (tensorflow#6694) If the setTimeout nesting level is greater than 5 and timeout is less than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom setTimeout is provided to mitigate the perf impact. BUG: tensorflow#6687 Co-authored-by: Na Li <[email protected]> * Upgrade windows BrowserStack chrome to 104 (tensorflow#6866) * webgpu: Disable importExternalTexture (tensorflow#6868) WebGPU Working Group recently found some problem with importExtenalTexture in spec, so we have to disable it temporarily. * Refactored Resizing Layer Unit Tests (#38) * Rescaling Preprocessing Layer Co-authored-by: David Kim (@koyykdy) <[email protected]> Brian Zheng (@Brianzheng123) <[email protected]> * PR issues resolved * linting and PR issues resolved Co-authored-by: Adam Lang (@AdamLang96) <[email protected]> Co-authored-by: (@Brianzheng123) <[email protected]> * initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer * refactoring in computeOutputShape for image resizing layer * Unit tests for image resizing preprocessing layer expanded and refactored * refactored unit tests for resizing layer * Preprocessing-Resizing layer unit test expansion and refactoring. Co-authored-by: Adam Lang <@AdamLang96> ([email protected]) * cleaning up commit diffs * cleaning up commit diffs * PR commit suggestions accepted - code refactored to reflect changes * resizing layer unit test refactoring Co-authored-by: AdamLang96 <[email protected]> * Linting issue resolved: unused import statement culled (#39) * Rescaling Preprocessing Layer Co-authored-by: David Kim (@koyykdy) <[email protected]> Brian Zheng (@Brianzheng123) <[email protected]> * PR issues resolved * linting and PR issues resolved Co-authored-by: Adam Lang (@AdamLang96) <[email protected]> Co-authored-by: (@Brianzheng123) <[email protected]> * initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer * refactoring in computeOutputShape for image resizing layer * Unit tests for image resizing preprocessing layer expanded and refactored * refactored unit tests for resizing layer * Preprocessing-Resizing layer unit test expansion and refactoring. Co-authored-by: Adam Lang <@AdamLang96> ([email protected]) * cleaning up commit diffs * cleaning up commit diffs * PR commit suggestions accepted - code refactored to reflect changes * resizing layer unit test refactoring * linting issues resolved: unusued import statement culled Co-authored-by: AdamLang96 <[email protected]> * Update jasmine_util.ts (tensorflow#6872) FIX * webgl: Fix NaN issue (tensorflow#6828) Fix tensorflow#6822 Problem 1: On some GPUs, even if a and b are both non-NaN, the value of isNaN in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0)); are still larger than 0., which misleads all values become NAN. 2: After resolving NAN issue, the result is still incorrect. It seems that the isnan_custom is not well supported on the problem GPU. After switching back to builtin isnan, everything works well. Solution: Use the bool type bvec4 instead of float type vec4 to calculate isNaN to avoid the the float precision issue when comparing with zero. Meanwhile, add an env flag WEBGL2_ISNAN_CUSTOM to allow user to specify which isnan to use. * Upgrade nodejs to 18.7.0 (tensorflow#6863) * Upgrade nodejs to 18.7.0 * Fix hash table test string not passed as base64 * fixed prelu fusing code that pre-maturely neg the const on multiply (tensorflow#6876) Co-authored-by: RajeshT <[email protected]> * Update tfjs-layers/src/layers/preprocessing/image_resizing.ts Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Yang Gu <[email protected]> Co-authored-by: Na Li <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: AdamLang96 <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Yang Gu <[email protected]> Co-authored-by: Na Li <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: AdamLang96 <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]>
* Update jasmine_util.ts (tensorflow#6872) FIX * webgl: Fix NaN issue (tensorflow#6828) Fix tensorflow#6822 Problem 1: On some GPUs, even if a and b are both non-NaN, the value of isNaN in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0)); are still larger than 0., which misleads all values become NAN. 2: After resolving NAN issue, the result is still incorrect. It seems that the isnan_custom is not well supported on the problem GPU. After switching back to builtin isnan, everything works well. Solution: Use the bool type bvec4 instead of float type vec4 to calculate isNaN to avoid the the float precision issue when comparing with zero. Meanwhile, add an env flag WEBGL2_ISNAN_CUSTOM to allow user to specify which isnan to use. * Upgrade nodejs to 18.7.0 (tensorflow#6863) * Upgrade nodejs to 18.7.0 * Fix hash table test string not passed as base64 * fixed prelu fusing code that pre-maturely neg the const on multiply (tensorflow#6876) Co-authored-by: RajeshT <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]>
* Update jasmine_util.ts (tensorflow#6872) FIX * webgl: Fix NaN issue (tensorflow#6828) Fix tensorflow#6822 Problem 1: On some GPUs, even if a and b are both non-NaN, the value of isNaN in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0)); are still larger than 0., which misleads all values become NAN. 2: After resolving NAN issue, the result is still incorrect. It seems that the isnan_custom is not well supported on the problem GPU. After switching back to builtin isnan, everything works well. Solution: Use the bool type bvec4 instead of float type vec4 to calculate isNaN to avoid the the float precision issue when comparing with zero. Meanwhile, add an env flag WEBGL2_ISNAN_CUSTOM to allow user to specify which isnan to use. * Upgrade nodejs to 18.7.0 (tensorflow#6863) * Upgrade nodejs to 18.7.0 * Fix hash table test string not passed as base64 * fixed prelu fusing code that pre-maturely neg the const on multiply (tensorflow#6876) Co-authored-by: RajeshT <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]>
Fix #6822
Problem
1: On some GPUs, even if
a
andb
are both non-NaN, the value ofisNaN
invec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0));
are still larger than0.
, which misleads all values becomeNAN
.2: After resolving
NAN
issue, the result is still incorrect. It seems that theisnan_custom
is not well supported on the problem GPU. After switching back to builtinisnan
, everything works well.Solution:
Use the bool type
bvec4
instead of float typevec4
to calculateisNaN
to avoid the the float precision issue when comparing with zero.Meanwhile, add an env flag
WEBGL_ISNAN_CUSTOM
to allow user to specify whichisnan
to use.To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is