-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing a bug in row scaling #570
Conversation
d[i] = 1.0 | ||
else | ||
d[i] = exp2(-e) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would have tried to somehow capture the case that all equations are just very small by looking at min(maximum(d), 1) * scaling_threshold
or so (with the maximum computed before the for loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we want to catch the case when one equation is very small, not just all. Or what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was basically to say that it would be nice if the behavior of the system would be the same for a *F as for F for any scalar a. If I multiply my system with a scalar a >> 1 then your check would not work anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended: Skeel's paper assumes that rows are nonzero, so we can only scale those rows which are (approximately) nonzero. Therefore, we should only expect the same behavior of G and a*G, where G is the subsystem of nonzero rows. This leaves us with two choices (a) if there is a zero row, dont scale the zero row or (b) if there is a zero row, dont scale at all. The PR chooses (a). Do you prefer (b)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No (a) is totally fine, my point was more about the detection if a row is a zero row. But anyways, fine with me this way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about e - m < scaling_threshold
where m = maximum(d)
?
This would be scaling invariant (and account for relative errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e < scaling_threshold * m
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because e
, m
and scaling_threshold
are at log scale. I do not convert them to 2^e
etc, because addition is cheaper than multiplication.
Row scaling is also applied when a row is (almost) zero. This causes some unintended behavior. This PR attempts to fix this.