Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid underflow and overflow in norm() #975

Merged
merged 21 commits into from
Feb 6, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 63 additions & 22 deletions src/linalg.jl
Original file line number Diff line number Diff line change
Expand Up @@ -217,26 +217,60 @@ end
# Norms
_inner_eltype(v::AbstractArray) = isempty(v) ? eltype(v) : _inner_eltype(first(v))
_inner_eltype(x::Number) = typeof(x)
@inline _init_zero(v::StaticArray) = float(norm(zero(_inner_eltype(v))))
@inline _init_zero(v::AbstractArray) = float(norm(zero(_inner_eltype(v))))

@inline maxabs_nested(a::Number) = abs(a)

function maxabs_nested(a::AbstractArray)
prod(size(a)) == 0 && (return _init_zero(a))

m = maxabs_nested(a[1])
for j = 2:prod(size(a))
m = max(m, maxabs_nested(a[j]))
end

return m
end

@generated function maxabs_nested(a::StaticArray)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see speedups from this generated function? Or perhaps it serves another purpose?

julia> for N in [3, 10, 30]
       @show N
       x = @SVector randn(N)
       @btime maximum(abs, $x)
       @btime maxabs_nested($x)  # PR, without generated path
       @btime maxabs_nested_g($x)  # with
       end
N = 3
  2.458 ns (0 allocations: 0 bytes)
  2.417 ns (0 allocations: 0 bytes)
  2.416 ns (0 allocations: 0 bytes)
N = 10
  13.652 ns (0 allocations: 0 bytes)
  13.096 ns (0 allocations: 0 bytes)
  13.652 ns (0 allocations: 0 bytes)
N = 30
  77.830 ns (0 allocations: 0 bytes)
  73.826 ns (0 allocations: 0 bytes)
  77.786 ns (0 allocations: 0 bytes)

Copy link
Contributor Author

@wsshin wsshin Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't perform performance comparison on this; I merely thought that @generated version would be always better for StaticArrays. Now that I think of it, @generated version of maxabs_nested() wouldn't be necessary as the function is non-allocating even without @generated.

If the current push passes all CI tests, I will remove the @generated version of maxabs_nested(). Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might even be simpler just to write mapreduce(abs, max, x). Or normInf I think. Unless you have a compelling reason to write it out for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapreduce(abs, max, x) was my initial approach, but it didn't pass the unit tests for nested array x.

normInf() sounds great. I will use it and remove maxabs_nested().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change that uses normInf() instead of maxabs_nested(). Please let me know if you have any other suggestions!

if prod(Size(a)) == 0
return :(_init_zero(a))
end

expr = :(maxabs_nested(a[1]))
for j = 2:prod(Size(a))
expr = :(max($expr, maxabs_nested(a[$j])))
end

return quote
$(Expr(:meta, :inline))
@inbounds return $expr
end
end

@inline function LinearAlgebra.norm_sqr(v::StaticArray)
return mapreduce(LinearAlgebra.norm_sqr, +, v; init=_init_zero(v))
end

@inline norm(a::StaticArray) = _norm(Size(a), a)
@generated function _norm(::Size{S}, a::StaticArray) where {S}
if prod(S) == 0
@generated function norm(a::StaticArray)
if prod(Size(a)) == 0
return :(_init_zero(a))
end

expr = :(LinearAlgebra.norm_sqr(a[1]))
for j = 2:prod(S)
expr = :($expr + LinearAlgebra.norm_sqr(a[$j]))
expr = :(LinearAlgebra.norm_sqr(a[1]/aₘ))
for j = 2:prod(Size(a))
expr = :($expr + LinearAlgebra.norm_sqr(a[$j]/aₘ))
end

return quote
$(Expr(:meta, :inline))
@inbounds return sqrt($expr)
zero_a = _init_zero(a)
aₘ = maxabs_nested(a)
if iszero(aₘ)
return zero_a
else
@inbounds return aₘ * sqrt($expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always takes the max, and then for nonzero norm always computes with the division. How does this compare to the speed of the existing code?

The alternative strategy would be just to compute the norm, and only if that is zero or infinite, worry about scaling. More work in the worst case but hopefully most cases in real use will be neither huge nor tiny.

Xref JuliaLang/julia#43256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always taking the max and dividing by it is what is done in LinearAlgebra to avoid overflow and underflow; see JuliaLang/julia#1861 (reference).

Copy link
Collaborator

@mcabbott mcabbott Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the PR I linked wants to change that.

I timed things:

julia> @btime norm($(@SVector rand(10)));  # tagged version
  1.833 ns (0 allocations: 0 bytes)

julia> @btime sqrt(sum(abs2, $(@SVector rand(10))));
  1.833 ns (0 allocations: 0 bytes)

julia> @btime norm_gen($(@SVector rand(10)));  # this PR
  23.971 ns (0 allocations: 0 bytes)  # (fixed, maybe)

julia> @btime mapreduce(abs, max, $(@SVector rand(10)));
  13.652 ns (0 allocations: 0 bytes)

julia> @btime @fastmath mapreduce(abs, max, $(@SVector rand(10)));
  2.708 ns (0 allocations: 0 bytes)

Here @fastmath will get Inf, NaN and -0.0 wrong, but if it can be safely used, helps a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current Julia master, the relevant code is here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And is quite slow, which 43256 might fix.

julia> @btime LinearAlgebra.generic_norm2($(randn(10)));
  20.269 ns (0 allocations: 0 bytes)

julia> @btime sqrt(sum(abs2, $(randn(10))));
  5.208 ns (0 allocations: 0 bytes)

# longer
  
julia> @btime LinearAlgebra.generic_norm2($(randn(10^4)));
  15.750 μs (0 allocations: 0 bytes)

julia> @btime sqrt(sum(abs2, $(randn(10^4))));
  2.083 μs (0 allocations: 0 bytes)

julia> @btime norm($(randn(10^4)));  # BLAS
  12.416 μs (0 allocations: 0 bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I understand your concern. It turns out that norm(::StaticArray) is slower than norm(::AbstractArray) for the same array size:

julia> v = rand(10); @btime norm($v);
  20.800 ns (0 allocations: 0 bytes)

julia> sv = @SVector rand(10); @btime norm($sv);
  28.727 ns (0 allocations: 0 bytes)

I will experiment with your suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't know that the PR you Xref'ed was julia's; I thought it was StaticArrays's PR. Now what you wrote makes much more sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change that performs scaling only when the norm is 0 or Inf. Now the benchmark is much better:

julia> v = rand(10); @btime norm($v);
  21.335 ns (0 allocations: 0 bytes)

julia> sv = @SVector rand(10); @btime norm($sv);
  5.730 ns (0 allocations: 0 bytes)

end
end
end

Expand All @@ -245,34 +279,42 @@ function _norm_p0(x)
return float(norm(iszero(x) ? zero(T) : one(T)))
end

@inline norm(a::StaticArray, p::Real) = _norm(Size(a), a, p)
@generated function _norm(::Size{S}, a::StaticArray, p::Real) where {S}
if prod(S) == 0
@generated function norm(a::StaticArray, p::Real)
if prod(Size(a)) == 0
return :(_init_zero(a))
end

expr = :(norm(a[1])^p)
for j = 2:prod(S)
expr = :($expr + norm(a[$j])^p)
expr = :(norm(a[1]/aₘ)^p)
for j = 2:prod(Size(a))
expr = :($expr + norm(a[$j]/aₘ)^p)
end

expr_p1 = :(norm(a[1]))
for j = 2:prod(S)
expr_p1 = :($expr_p1 + norm(a[$j]))
expr_p1 = :(norm(a[1]/aₘ))
for j = 2:prod(Size(a))
expr_p1 = :($expr_p1 + norm(a[$j]/aₘ))
end

expr_pInf = :(norm(a[1]/aₘ))
for j = 2:prod(Size(a))
expr_pInf = :(max($expr_pInf, norm(a[$j]/aₘ)))
end

return quote
$(Expr(:meta, :inline))
if p == Inf
return mapreduce(norm, max, a)
zero_a = _init_zero(a)
aₘ = maxabs_nested(a)
if iszero(aₘ)
return zero_a
elseif p == Inf
return aₘ * $expr_pInf
elseif p == 1
@inbounds return $expr_p1
@inbounds return aₘ * $expr_p1
elseif p == 2
return norm(a)
elseif p == 0
return mapreduce(_norm_p0, +, a)
else
@inbounds return ($expr)^(inv(p))
@inbounds return aₘ * ($expr)^(inv(p))
end
end
end
Expand Down Expand Up @@ -466,4 +508,3 @@ end
# Some shimming for special linear algebra matrix types
@inline LinearAlgebra.Symmetric(A::StaticMatrix, uplo::Char='U') = (checksquare(A); Symmetric{eltype(A),typeof(A)}(A, uplo))
@inline LinearAlgebra.Hermitian(A::StaticMatrix, uplo::Char='U') = (checksquare(A); Hermitian{eltype(A),typeof(A)}(A, uplo))

1 change: 1 addition & 0 deletions test/linalg.jl
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ StaticArrays.similar_type(::Union{RotMat2,Type{RotMat2}}) = SMatrix{2,2,Float64,
end

@testset "normalization" begin
@test norm(SVector(0.0,1e-180)) == 1e-180
@test norm(SVector(1.0,2.0,2.0)) ≈ 3.0
@test norm(SVector(1.0,2.0,2.0),2) ≈ 3.0
@test norm(SVector(1.0,2.0,2.0),Inf) ≈ 2.0
Expand Down