Refinements to `hvncat` #41143

BioTurboNick · 2021-06-09T02:08:41Z

This PR is collecting changes based on feedback on the functions, including #41111, #41107, and #41047.

Adds robustness through stronger input checking, so that the functions are safe to call on their own
More consistent behavior for 0- and 1-dimension forms, as suggested by @matthias314 .
Added back some judicious @inbounds and @inline hints. Hopefully someone can ensure they're proper this time 😅
More comprehensive tests
Added Int form to documentation
Performance of unbalanced concatenations (e.g. const a = fill(1); const b = [a a]; [a a ;;; b]) improved to ~2x faster and 10 allocations 784, compared with cat([a a], b, dims = 3)'s 32 allocations and 1.12 KiB.
Realized dimensionality of result was not looking at all the input arguments; now it uses the maximum of the input arguments and the concatenation dimension

base/abstractarray.jl

test/abstractarray.jl

matthias314 · 2021-06-09T12:21:53Z

1-dimensional case

In this discussion the idea came up that in the 1-dimensional case hvncat returns a vector or a 1 x n matrix, depending on the value of row_first. This PR implements this for the shape form, but not for the dims form:

julia> hvncat( (2,), true, 1, 2) |> size
(2,)
julia> hvncat( ((2,),), true, 1, 2) |> size
(1, 2)

Is this intended? Let me also remark that the shape form of hvncat is not type-stable anymore with this change. It may still be a good idea to do it, but I think the reviewers should be aware of this.

Methods

It seems to me that the code contains quite a few redundant methods. Is this intentional? In my opinion, this makes the code hard to read. Some examples:

Here the second line covers all the following ones (without runtime overhead, I believe):

_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool) = _typed_hvncat(Any, dimsshape, row_first)                                                           
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs...) = _typed_hvncat(promote_eltypeof(xs...), dimsshape, row_first, xs...)                         
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::T...) where T<:Number = _typed_hvncat(T, dimsshape, row_first, xs...)                            
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::Number...) = _typed_hvncat(promote_typeof(xs...), dimsshape, row_first, xs...)                   
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::AbstractArray...) = _typed_hvncat(promote_eltype(xs...), dimsshape, row_first, xs...)            
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::AbstractArray{T}...) where T = _typed_hvncat(T, dimsshape, row_first, xs...)

Here the second line covers the first:

typed_hvncat(T::Type, dimsshape::NTuple{1}, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                         
typed_hvncat(T::Type, dimsshape::Tuple, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)

BioTurboNick · 2021-06-09T12:49:49Z

1-dimensional case

In this discussion the idea came up that in the 1-dimensional case hvncat returns a vector or a 1 x n matrix, depending on the value of row_first. This PR implements this for the shape form, but not for the dims form:

Is this intended? Let me also remark that the shape form of hvncat is not type-stable anymore with this change. It may still be a good idea to do it, but I think the reviewers should be aware of this.

Yes, it is intended. That could potentially be changed... I'd have to dive back into the parser code though to make that work.

With respect to type stability, is it any less type-stable than e.g. cat(1, 2, dims = 3) vs. cat(1, 2, dims = 4)? My understanding from looking at the cat example was that using Val internally and introducing a method barrier limits the impacts of type instability. Though I'm sure I'm missing something here?

Methods

It seems to me that the code contains quite a few redundant methods. Is this intentional? In my opinion, this makes the code hard to read. Some examples:

Here the second line covers all the following ones (without runtime overhead, I believe):

I'm not sure why all the different promote_eltype/eltypeof/typeof methods were introduced, but it's what hvcat does, so I copied its example.

Here the second line covers the first:

typed_hvncat(T::Type, dimsshape::NTuple{1}, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                         
typed_hvncat(T::Type, dimsshape::Tuple, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)

Ah, yes, good catch. Before the first one dispatched to typed_vcat and missed that it wasn't necessary anymore.

matthias314 · 2021-06-09T13:00:05Z

I was only giving examples of what seemed redundant methods to me. There are more, for instance:

_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x) where T = fill(T(x))                                                                                         
_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x::Number) where T = fill(T(x))                                                                                 

_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, xs::Number...) = _typed_hvncat_1d(T, dims[1], Val(false), xs...)                                           
_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, as...) = _typed_hvncat_1d(T, dims[1], Val(false), as...)

I think it would be good to take the time and check all methods introduced for hvncat.

BioTurboNick · 2021-06-09T13:23:04Z

I was only giving examples of what seemed redundant methods to me. There are more, for instance:

_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x) where T = fill(T(x))                                                                                         
_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x::Number) where T = fill(T(x))                                                                                 

_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, xs::Number...) = _typed_hvncat_1d(T, dims[1], Val(false), xs...)                                           
_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, as...) = _typed_hvncat_1d(T, dims[1], Val(false), as...)

I think it would be good to take the time and check all methods introduced for hvncat.

Julia made me do those to resolve ambiguity. That's where most of this apparent redundancy is coming from. ~~Is there an easier way to find out if methods could be combined without removing one-by-one and rebuilding Julia?~~ Duh, figured out a shortcut

BioTurboNick · 2021-06-09T15:32:51Z

From @mbauman 's comment here: #41101 (review)

julia/base/abstractarray.jl

Line 2295 in de1444c

nd = max(N, cat_ndims(as[1]))

Is it possible for as to be empty?

Normal dispatch should ensure it isn't empty. But, the new checks in this PR ensures: 1) elements of shape are all > 0 and 2) that the last one is equal to the length of as. ~~So if it somehow got called and as was empty, it would error.~~ Oh, except that check comes after... I'll check on that.

matthias314 · 2021-06-09T18:01:00Z

I see, some method definitions in the source code are more subtle than I thought.

Regarding the multiple methods for _hvncat: I've commented out the last four methods in the source code and played around with it. Everything still seems to work, and all tests for hvncat pass.

Something else: One still gets

julia> hvncat((), false)
Any[]

As I explained here, I think hvncat should require exactly one element if the first argument is an empty tuple and return a 0-dimensional array (unless the element is an array itself). Returning an empty vector if no elements are given is not consistent with the other cases in my opinion.

BioTurboNick · 2021-06-09T18:26:17Z

Something else: One still gets
julia> hvncat((), false)
Any[]
As I explained here, I think hvncat should require exactly one element if the first argument is an empty tuple and return a 0-dimensional array (unless the element is an array itself). Returning an empty vector if no elements are given is not consistent with the other cases in my opinion.

I understand, and there's a sense to it. But the base output for all the cat methods are empty vectors. I raised a related issue with respect to the other methods here: #40111

julia> hvcat(())
Any[]

julia> vcat()
Any[]

julia> hcat()
Any[]

I don't have a strong opinion on what's right here, but I don't want to break that pattern unless there's consensus about it.

matthias314 · 2021-06-09T20:33:06Z

I don't want to break that pattern

I see. However, wouldn't that mean that hvncat accepts 0's in the dims tuple? For any tuple a, the call vcat(a...) should be equivalent to hvncat((length(a),), false, a...). I was planning to write that this gives a "dims argument must contain positive integers" error for a = ().

However, after the latest commit 88694a0 it leads to a different error, even for non-empty a:

julia> hvncat((2,), false, 1, 2)
ERROR: MethodError: _typed_hvncat(::Type{Int64}, ::Tuple{Int64}, ::Bool, ::Int64, ::Int64) is ambiguous.

EDIT: The first example also shows that one cannot really compare vcat() (and hcat()) to hvncat with an empty tuple as first argument.

BioTurboNick · 2021-06-09T20:34:54Z

Haha. Now you see my struggle. I removed the ::Number... method and thought it seemed to be working. I'll have to add it back again.

BioTurboNick · 2021-06-09T23:29:38Z

I'll give some more thought to your view on the bare-minimum call, @matthias314.

Co-authored-by: Jameson Nash <[email protected]>

BioTurboNick · 2021-06-11T11:27:06Z

I was able to tighten up the 0-length dimension issue. A couple wrong inputs do produce an output, but it's not too bad. I'm having trouble figuring out how to detect that case. e.g. [zeros(0, 2, 1);;; [1 3]]. It should probably error instead of being ignored. Currently, this produces [1 3], but should error.

simeonschaub · 2021-06-11T11:59:28Z

Just as a general tip: Reviewing PRs is a lot easier if you open smaller separate PRs, which each only address one issue at a time instead of large PRs addressing multiple separate issues. I know that putting everything into one PR is often easier as an author, because you don't need to think about dependencies and possible conflicts, but as a reviewer, I am often not as familiar with all details of the implementation as you are, so it's quite difficult to know where to start reviewing. It's not immediately obvious here, which changes are actually bug fixes, which are performance improvements and which are new features. If those were separate parts, I am sure you would get much quicker reviews because it is clear what the change does and what issue it fixes.

Please don't let this discourage you in any way though, your work here is very much appreciated! I just thought this might be generally helpful advice for contributing to OSS projects. I will still try to get around reviewing this soon.

BioTurboNick · 2021-06-11T15:00:30Z

@simeonschaub, sure, I understand. I can break some of this up into smaller bits.

JeffBezanson · 2021-06-14T22:05:28Z

Can this be closed in favor of the new series of PRs?

vtjnash reviewed Jun 9, 2021

View reviewed changes

base/abstractarray.jl Outdated Show resolved Hide resolved

vtjnash reviewed Jun 9, 2021

View reviewed changes

test/abstractarray.jl Outdated Show resolved Hide resolved

BioTurboNick and others added 18 commits June 10, 2021 11:24

Implement similar to allow non-Array outputs based on inputs

32bdd82

remove commented line

00246df

Add test for similar via BitArrays

f31b7b8

lower-dimension improvement, tests

1409d5f

Implement rigorous checking for 0 and negative dims

a8ae12f

More robust argument checking

cb4c348

consistency in throw syntax

8ff9e2a

test fixes

3383431

Improved performance and bounds checking of shape form

ceb0f45

Simplified types

fc93dfd

More error checks

bf1cf91

Judicious inbounds added back

ae03b0a

fix test

a0f4e4b

whitespace fix

28de37d

remove stray using

9e96e7f

Co-authored-by: Jameson Nash <[email protected]>

Removed unnecessary method

7e8ab40

Organization and documentation

6ffbcd0

Minor fixups, added argument check to shape method

ef2fe36

BioTurboNick added 14 commits June 10, 2021 11:24

More rigorous shape tests

13d08f6

Adjusted inbounds to be more conservative

8a2dc40

Additional checks to ensure at least one element in vararg

fbf959a

Add back ::Number...

3549fe2

Remove some unnecessary tests

767185a

resolve ambiguity

e0884a9

Remove unneeded comments

ead8454

Stronger shape check

34cf142

throw reformat to shorten lines, removed printlns

2b39480

Replace looped test with a more specific one

ec7e5f8

Added to docstring

06e3276

Removed commented-out code

1e49c1c

Resolve ambiguity

a499f8d

whitespace fix

c171978

BioTurboNick force-pushed the hvncat-fixes branch from 71948df to c171978 Compare June 10, 2021 15:26

Fixed issue with output dimensionality lower than should be

2f6b88e

BioTurboNick mentioned this pull request Jun 11, 2021

Add check for 0-dimensional array arguments in hvncat and produce an error #41101

Closed

BioTurboNick added 2 commits June 11, 2021 06:44

Restored performance

8acaac0

stronger zero-length dims checks

7a74647

BioTurboNick closed this Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refinements to `hvncat` #41143

Refinements to `hvncat` #41143

BioTurboNick commented Jun 9, 2021 •

edited

Loading

matthias314 commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021 •

edited

Loading

1-dimensional case

Methods

matthias314 commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021 •

edited

Loading

BioTurboNick commented Jun 9, 2021 •

edited

Loading

matthias314 commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021

matthias314 commented Jun 9, 2021 •

edited

Loading

BioTurboNick commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021

BioTurboNick commented Jun 11, 2021

simeonschaub commented Jun 11, 2021

BioTurboNick commented Jun 11, 2021

JeffBezanson commented Jun 14, 2021

Refinements to hvncat #41143

Refinements to hvncat #41143

Conversation

BioTurboNick commented Jun 9, 2021 • edited Loading

matthias314 commented Jun 9, 2021

1-dimensional case

Methods

BioTurboNick commented Jun 9, 2021 • edited Loading

1-dimensional case

Methods

matthias314 commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021 • edited Loading

BioTurboNick commented Jun 9, 2021 • edited Loading

matthias314 commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021

matthias314 commented Jun 9, 2021 • edited Loading

BioTurboNick commented Jun 9, 2021

BioTurboNick commented Jun 9, 2021

BioTurboNick commented Jun 11, 2021

simeonschaub commented Jun 11, 2021

BioTurboNick commented Jun 11, 2021

JeffBezanson commented Jun 14, 2021

Refinements to `hvncat` #41143

Refinements to `hvncat` #41143

BioTurboNick commented Jun 9, 2021 •

edited

Loading

BioTurboNick commented Jun 9, 2021 •

edited

Loading

BioTurboNick commented Jun 9, 2021 •

edited

Loading

BioTurboNick commented Jun 9, 2021 •

edited

Loading

matthias314 commented Jun 9, 2021 •

edited

Loading