-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster and simpler generic_norm2 #43256
base: master
Are you sure you want to change the base?
faster and simpler generic_norm2 #43256
Conversation
I'm not 100% sure if this works. Probably needs a PkgEval
stdlib/LinearAlgebra/src/generic.jl
Outdated
@@ -462,27 +462,12 @@ norm_sqr(x::Union{T,Complex{T},Rational{T}}) where {T<:Integer} = abs2(float(x)) | |||
|
|||
function generic_norm2(x) | |||
maxabs = normInf(x) | |||
(maxabs == 0 || isinf(maxabs)) && return maxabs | |||
(v, s) = iterate(x)::Tuple | |||
(ismissing(maxabs) || maxabs == 0 || isinf(maxabs)) && return maxabs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sad if we need to add a bunch of special-cased missing
code to LinearAlgebra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. We can leave out that test. It's just that the transition to mapreduce
facilitated norm
handle missing for certain p
s, without pulling *missing
to the surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would
(ismissing(maxabs) || maxabs == 0 || isinf(maxabs)) && return maxabs | |
(isinf(maxabs) !== false || maxabs == 0) && return maxabs |
be preferable? (Which I think should be equivalent, but please do double-check.)
EDIT by @dkarrasch : one needs to avoid performing maxabs == 0
because that throws a TypeError: non-boolean (Missing) used in boolean context
. Otherwise that would work, because of short-circuiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that you get a notification: I modified @martinholters's suggestion, which I tested for maxabs = missing
and it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and that same trick should make it possible to apply the same steps to generic_normp
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If tests pass without the init
keyword, this LGTM, up to the controversial ismissing(maxabs)
, which we could as well remove here and keep that for discussion. I acknowledge the sentiment against explicit missing
handling, it's just that normInf(x)
succeeds and returns missing
whenever there's a missing
entry, so it's the logic in (maxabs == 0 || isinf(maxabs))
that fails. The mapreduce
step handles everything correctly again.
@nanosoldier |
Why isn't nanosoldier running on this? |
Not sure, let's try again: @nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@maleadt any idea why the report is broken? |
Just click |
The reason the report is so big is because so many packages failed. But the results look quite strange. Perhaps rerun it? |
@nanosoldier |
1 similar comment
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
I've decided that while I have this PR, I might as well also fix generic_normp, so this needs review again. Also I fixed a bug where we weren't accumulating results in enough precision. |
Co-authored-by: Daniel Karrasch <[email protected]>
Co-authored-by: Daniel Karrasch <[email protected]>
The problem seems to be that function mygeneric_norm2(x)
maxabs = normInf(x)
(isinf(maxabs) !== false || maxabs == 0) && return maxabs
T = typeof(maxabs)
sum = zero(promote_type(Float64, T))
if isfinite(length(x)*maxabs*maxabs) && maxabs*maxabs != 0 # Scaling not necessary
for v in x
sum += norm_sqr(v)
end
return convert(T, sqrt(sum))
else
invmaxabs = inv(maxabs)
if isfinite(invmaxabs)
for v in x
sum += (norm(v) * invmaxabs)^2
end
else
for v in x
sum += (norm(v) / maxabs)^2
end
end
return convert(T, maxabs*sqrt(sum))
end
end It uses ideas that have come up in the discussion of this PR. |
Is this PR still "simpler"? It doesn't really feel like that. |
I agree, it seems hard/impossible to simplify things via |
I think I'll let this sit until captured variables in closures improves. |
T = typeof(float(norm(first(x)))) | ||
sT = promote_type(T, Float64) | ||
ans = mapreduce(norm_sqr, +, x) | ||
ans in (0, Inf) || return convert(T, sqrt(ans)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to check whether in
does ==
. But it does, perhaps that's OK.
I'd also rather not use ans
as a variable name. And would prefer to use a different name for the second path's output, especially as it has a different type.
for v in x | ||
ans += sT(norm(v))^p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth special-casing p==3
, p==0.5
, for which we can replace ^p
with faster functions?
for v in x | ||
ans += (norm(v)/maxabs)^2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't measure any change to pulling out the division, and multiplying by invmaxabs
.
But adding @simd for v in x
seems to help quite a bit. Is it safe to do so? (Or maybe this method won't get called on the sort of arrays for which it is beneficial anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second. I don't want to deal with the invmaxabs
here since we're already in a slow path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, times for me with rand(1000)
are
for v in x; out += (norm(v)/maxabs)^2
takes 2.352 μs- with
@simd
1.733 μs, - both after
normInf(x)
which takes 1.404 μs, same asmaximum
vs:
BLAS.nrm2
takes 1.196 μsmapreduce(norm_sqr, +, x)
takes 229.763 ns
So I guess LinearAlgebra.NRM2_CUTOFF should be higher, currently 32.
But also, why is maximum so slow? Can this be done less carefully here since we don't care about -0.0 and NaN?
sT = promote_type(T, Float64) | ||
ans = mapreduce(norm_sqr, +, x) | ||
ans in (0, Inf) || return convert(T, sqrt(ans)) | ||
maxabs = sT(normInf(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code has one more short-circuit here, returning 0/Inf if maxabs
is either. Might be worthwhile to have that here? All-zeros might be the most common case after finite norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would all zero be common? I'd think that would be pretty rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant all zero might be more common than truly tiny values. Hopefully both much less common than values about 1.
Co-authored-by: Michael Abbott <[email protected]>
What's the status of this? After #40790 it will at least need to be rebased. |
the status is that I haven't thought about this for a year because I was running into really annoying issues with closures capturing variables that were ruining the performance. |
Ok! Was reminded by this thread, and what's here seems pretty quick (but has 1 mystery allocation). |
I'm not 100% sure if this works. Probably needs a PkgEval.