Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster and simpler generic_norm2 #43256

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
73 changes: 23 additions & 50 deletions stdlib/LinearAlgebra/src/generic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -457,66 +457,39 @@ generic_norm1(x) = mapreduce(float ∘ norm, +, x)

# faster computation of norm(x)^2, avoiding overflow for integers
norm_sqr(x) = norm(x)^2
norm_sqr(x::Number) = abs2(x)
norm_sqr(x::Union{T,Complex{T},Rational{T}}) where {T<:Integer} = abs2(float(x))
norm_sqr(x::T) where {T<:Number} = abs2(promote_type(T,Float64)(x))

function generic_norm2(x)
maxabs = normInf(x)
(maxabs == 0 || isinf(maxabs)) && return maxabs
(v, s) = iterate(x)::Tuple
T = typeof(maxabs)
if isfinite(length(x)*maxabs*maxabs) && maxabs*maxabs != 0 # Scaling not necessary
sum::promote_type(Float64, T) = norm_sqr(v)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += norm_sqr(v)
end
return convert(T, sqrt(sum))
else
sum = abs2(norm(v)/maxabs)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += (norm(v)/maxabs)^2
end
return convert(T, maxabs*sqrt(sum))
isempty(x) && return norm(zero(eltype(x)))
T = typeof(norm(first(x)))
sT = promote_type(T, Float64)
ans = mapreduce(norm_sqr, +, x)
ans in (0, Inf) || return convert(T, sqrt(ans))
Copy link
Contributor

@mcabbott mcabbott Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to check whether in does ==. But it does, perhaps that's OK.

I'd also rather not use ans as a variable name. And would prefer to use a different name for the second path's output, especially as it has a different type.

maxabs = sT(normInf(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code has one more short-circuit here, returning 0/Inf if maxabs is either. Might be worthwhile to have that here? All-zeros might be the most common case after finite norm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would all zero be common? I'd think that would be pretty rare.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant all zero might be more common than truly tiny values. Hopefully both much less common than values about 1.

ans = sT(0)
for v in x
ans += (norm(v)/maxabs)^2
Comment on lines +470 to +471
Copy link
Contributor

@mcabbott mcabbott Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't measure any change to pulling out the division, and multiplying by invmaxabs.

But adding @simd for v in x seems to help quite a bit. Is it safe to do so? (Or maybe this method won't get called on the sort of arrays for which it is beneficial anyway.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second. I don't want to deal with the invmaxabs here since we're already in a slow path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, times for me with rand(1000) are

  • for v in x; out += (norm(v)/maxabs)^2 takes 2.352 μs
  • with @simd 1.733 μs,
  • both after normInf(x) which takes 1.404 μs, same as maximum

vs:

  • BLAS.nrm2 takes 1.196 μs
  • mapreduce(norm_sqr, +, x) takes 229.763 ns

So I guess LinearAlgebra.NRM2_CUTOFF should be higher, currently 32.

But also, why is maximum so slow? Can this be done less carefully here since we don't care about -0.0 and NaN?

end
return convert(T, maxabs*sqrt(ans))
end

# Compute L_p norm ‖x‖ₚ = sum(abs(x).^p)^(1/p)
# (Not technically a "norm" for p < 1.)
function generic_normp(x, p)
(v, s) = iterate(x)::Tuple
if p > 1 || p < -1 # might need to rescale to avoid overflow
maxabs = p > 1 ? normInf(x) : normMinusInf(x)
(maxabs == 0 || isinf(maxabs)) && return maxabs
T = typeof(maxabs)
else
T = typeof(float(norm(v)))
isempty(x) && return float(norm(zero(eltype(itr))))
T = typeof(float(norm(first(x))))
sT = promote_type(T, Float64)
ans = sT(0)
for v in x
ans += sT(norm(v))^p
Comment on lines +483 to +484
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth special-casing p==3, p==0.5, for which we can replace ^p with faster functions?

end
spp::promote_type(Float64, T) = p
if -1 <= p <= 1 || (isfinite(length(x)*maxabs^spp) && maxabs^spp != 0) # scaling not necessary
sum::promote_type(Float64, T) = norm(v)^spp
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += norm(v)^spp
end
return convert(T, sum^inv(spp))
else # rescaling
sum = (norm(v)/maxabs)^spp
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += (norm(v)/maxabs)^spp
end
return convert(T, maxabs*sum^inv(spp))
ans in (0, Inf) || return convert(T, ans^inv(spp))
maxabs = p > 1 ? normInf(x) : normMinusInf(x)
ans = sT(0)
for v in x
ans += (sT(norm(v))/maxabs)^p
end
return convert(T, ans^inv(p))
end

normMinusInf(x) = generic_normMinusInf(x)
Expand Down