-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix backward_clip num inputs and type of clip params #15688
Conversation
@ptrendx Can you also add a unit test for the fix you made ? @mxnet-label-bot Add [Operator, Backend, pr-awaiting-review] |
@piyushghai Sure, I should be able to do it tomorrow. |
Last commit adds the test that uncovered the initial problem and also fixes #12901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to recommend some improvements to the mx.sym.clip documentation:
- correct typo 'Clipping x between a_min and a_x would be` -> 'Clipping x between a_min and a_max would be'
- the description 'clip(x, a_min, a_max) = max(min(x, a_max), a_min))' suggests the definition of MXNet's clip in terms of MXNet's min and max, rather than a mathematical definition. Better to switch to italics as is used in the description of hard_sigmoid.
- the treatment of a_min and a_max is a bit loose. Users may be surprised to learn that values can be passed that exceed the range of a_min and a_max specified in the python code, due to rounding effects. Fixing this won't be particularly concise, so I recommend this for a followup PR if it seems critical to anyone. Best would be to document the behavior with wording like:
The values passed can be as as large as cast(cast(a_max, dtype=float64), dtype=x.dtype),
or as small as cast(cast(a_min, dtype=float64), dtype=x.dtype), where the casts are round-to-nearest.
I'm not sure how to not overload the user with information with all those casts there. I made the other changes to docs. |
My thought with the comment about a_min and a_max casting was to describe precisely the behavior of the op. While my first suggestion might be too detailed, I still think it important to alert the user to the rounding of the limits, in case they have some home-grown bucketing of the clipped outputs. How about:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying problem exposed here by clip() with monitor use and exected-vs-actual input counts has now spawned follow-up PR #15834. Given that and the documentation improvements, this PR LGTM.
* Fix backward_clip num inputs and type of clip params * Clip test * Trigger CI * Changes to clip docs * Fix docstring * Trigger CI
Description
Fix backward of clip to correctly report 2 inputs instead of 1. Changed the type of a_min and a_max used inside the kernel to avoid numerical inconsistencies in fp16 (even though the final comparison was done in fp32 and the limits were originally in fp32, they were casted to and from fp16 entering the function).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.