Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refinements to hvncat #41143

Closed
wants to merge 36 commits into from
Closed

Conversation

BioTurboNick
Copy link
Contributor

@BioTurboNick BioTurboNick commented Jun 9, 2021

This PR is collecting changes based on feedback on the functions, including #41111, #41107, and #41047.

  • Adds robustness through stronger input checking, so that the functions are safe to call on their own
  • More consistent behavior for 0- and 1-dimension forms, as suggested by @matthias314 .
  • Added back some judicious @inbounds and @inline hints. Hopefully someone can ensure they're proper this time 😅
  • More comprehensive tests
  • Added Int form to documentation
  • Performance of unbalanced concatenations (e.g. const a = fill(1); const b = [a a]; [a a ;;; b]) improved to ~2x faster and 10 allocations 784, compared with cat([a a], b, dims = 3)'s 32 allocations and 1.12 KiB.
  • Realized dimensionality of result was not looking at all the input arguments; now it uses the maximum of the input arguments and the concatenation dimension

base/abstractarray.jl Outdated Show resolved Hide resolved
test/abstractarray.jl Outdated Show resolved Hide resolved
@matthias314
Copy link
Contributor

1-dimensional case

In this discussion the idea came up that in the 1-dimensional case hvncat returns a vector or a 1 x n matrix, depending on the value of row_first. This PR implements this for the shape form, but not for the dims form:

julia> hvncat( (2,), true, 1, 2) |> size
(2,)
julia> hvncat( ((2,),), true, 1, 2) |> size
(1, 2)

Is this intended? Let me also remark that the shape form of hvncat is not type-stable anymore with this change. It may still be a good idea to do it, but I think the reviewers should be aware of this.

Methods

It seems to me that the code contains quite a few redundant methods. Is this intentional? In my opinion, this makes the code hard to read. Some examples:

Here the second line covers all the following ones (without runtime overhead, I believe):

_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool) = _typed_hvncat(Any, dimsshape, row_first)                                                           
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs...) = _typed_hvncat(promote_eltypeof(xs...), dimsshape, row_first, xs...)                         
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::T...) where T<:Number = _typed_hvncat(T, dimsshape, row_first, xs...)                            
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::Number...) = _typed_hvncat(promote_typeof(xs...), dimsshape, row_first, xs...)                   
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::AbstractArray...) = _typed_hvncat(promote_eltype(xs...), dimsshape, row_first, xs...)            
_hvncat(dimsshape::Union{Tuple, Int}, row_first::Bool, xs::AbstractArray{T}...) where T = _typed_hvncat(T, dimsshape, row_first, xs...)                     

Here the second line covers the first:

typed_hvncat(T::Type, dimsshape::NTuple{1}, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                         
typed_hvncat(T::Type, dimsshape::Tuple, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                             

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Jun 9, 2021

1-dimensional case

In this discussion the idea came up that in the 1-dimensional case hvncat returns a vector or a 1 x n matrix, depending on the value of row_first. This PR implements this for the shape form, but not for the dims form:

Is this intended? Let me also remark that the shape form of hvncat is not type-stable anymore with this change. It may still be a good idea to do it, but I think the reviewers should be aware of this.

Yes, it is intended. That could potentially be changed... I'd have to dive back into the parser code though to make that work.

With respect to type stability, is it any less type-stable than e.g. cat(1, 2, dims = 3) vs. cat(1, 2, dims = 4)? My understanding from looking at the cat example was that using Val internally and introducing a method barrier limits the impacts of type instability. Though I'm sure I'm missing something here?

Methods

It seems to me that the code contains quite a few redundant methods. Is this intentional? In my opinion, this makes the code hard to read. Some examples:

Here the second line covers all the following ones (without runtime overhead, I believe):

I'm not sure why all the different promote_eltype/eltypeof/typeof methods were introduced, but it's what hvcat does, so I copied its example.

Here the second line covers the first:

typed_hvncat(T::Type, dimsshape::NTuple{1}, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                         
typed_hvncat(T::Type, dimsshape::Tuple, row_first::Bool, xs...) = _typed_hvncat(T, dimsshape, row_first, xs...)                                             

Ah, yes, good catch. Before the first one dispatched to typed_vcat and missed that it wasn't necessary anymore.

@matthias314
Copy link
Contributor

I was only giving examples of what seemed redundant methods to me. There are more, for instance:

_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x) where T = fill(T(x))                                                                                         
_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x::Number) where T = fill(T(x))                                                                                 

_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, xs::Number...) = _typed_hvncat_1d(T, dims[1], Val(false), xs...)                                           
_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, as...) = _typed_hvncat_1d(T, dims[1], Val(false), as...)                                                   

I think it would be good to take the time and check all methods introduced for hvncat.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Jun 9, 2021

I was only giving examples of what seemed redundant methods to me. There are more, for instance:

_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x) where T = fill(T(x))                                                                                         
_typed_hvncat(::Type{T}, ::Tuple{}, ::Bool, x::Number) where T = fill(T(x))                                                                                 

_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, xs::Number...) = _typed_hvncat_1d(T, dims[1], Val(false), xs...)                                           
_typed_hvncat(T::Type, dims::Tuple{Int}, ::Bool, as...) = _typed_hvncat_1d(T, dims[1], Val(false), as...)                                                   

I think it would be good to take the time and check all methods introduced for hvncat.

Julia made me do those to resolve ambiguity. That's where most of this apparent redundancy is coming from. Is there an easier way to find out if methods could be combined without removing one-by-one and rebuilding Julia? Duh, figured out a shortcut

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Jun 9, 2021

From @mbauman 's comment here: #41101 (review)

nd = max(N, cat_ndims(as[1]))

Is it possible for as to be empty?

Normal dispatch should ensure it isn't empty. But, the new checks in this PR ensures: 1) elements of shape are all > 0 and 2) that the last one is equal to the length of as. So if it somehow got called and as was empty, it would error. Oh, except that check comes after... I'll check on that.

@matthias314
Copy link
Contributor

I see, some method definitions in the source code are more subtle than I thought.

Regarding the multiple methods for _hvncat: I've commented out the last four methods in the source code and played around with it. Everything still seems to work, and all tests for hvncat pass.

Something else: One still gets

julia> hvncat((), false)
Any[]

As I explained here, I think hvncat should require exactly one element if the first argument is an empty tuple and return a 0-dimensional array (unless the element is an array itself). Returning an empty vector if no elements are given is not consistent with the other cases in my opinion.

@BioTurboNick
Copy link
Contributor Author

Something else: One still gets

julia> hvncat((), false)
Any[]

As I explained here, I think hvncat should require exactly one element if the first argument is an empty tuple and return a 0-dimensional array (unless the element is an array itself). Returning an empty vector if no elements are given is not consistent with the other cases in my opinion.

I understand, and there's a sense to it. But the base output for all the cat methods are empty vectors. I raised a related issue with respect to the other methods here: #40111

julia> hvcat(())
Any[]

julia> vcat()
Any[]

julia> hcat()
Any[]

I don't have a strong opinion on what's right here, but I don't want to break that pattern unless there's consensus about it.

@matthias314
Copy link
Contributor

matthias314 commented Jun 9, 2021

I don't want to break that pattern

I see. However, wouldn't that mean that hvncat accepts 0's in the dims tuple? For any tuple a, the call vcat(a...) should be equivalent to hvncat((length(a),), false, a...). I was planning to write that this gives a "dims argument must contain positive integers" error for a = ().

However, after the latest commit 88694a0 it leads to a different error, even for non-empty a:

julia> hvncat((2,), false, 1, 2)
ERROR: MethodError: _typed_hvncat(::Type{Int64}, ::Tuple{Int64}, ::Bool, ::Int64, ::Int64) is ambiguous.

EDIT: The first example also shows that one cannot really compare vcat() (and hcat()) to hvncat with an empty tuple as first argument.

@BioTurboNick
Copy link
Contributor Author

Haha. Now you see my struggle. I removed the ::Number... method and thought it seemed to be working. I'll have to add it back again.

@BioTurboNick
Copy link
Contributor Author

I'll give some more thought to your view on the bare-minimum call, @matthias314.

@BioTurboNick
Copy link
Contributor Author

I was able to tighten up the 0-length dimension issue. A couple wrong inputs do produce an output, but it's not too bad. I'm having trouble figuring out how to detect that case. e.g. [zeros(0, 2, 1);;; [1 3]]. It should probably error instead of being ignored. Currently, this produces [1 3], but should error.

@simeonschaub
Copy link
Member

Just as a general tip: Reviewing PRs is a lot easier if you open smaller separate PRs, which each only address one issue at a time instead of large PRs addressing multiple separate issues. I know that putting everything into one PR is often easier as an author, because you don't need to think about dependencies and possible conflicts, but as a reviewer, I am often not as familiar with all details of the implementation as you are, so it's quite difficult to know where to start reviewing. It's not immediately obvious here, which changes are actually bug fixes, which are performance improvements and which are new features. If those were separate parts, I am sure you would get much quicker reviews because it is clear what the change does and what issue it fixes.

Please don't let this discourage you in any way though, your work here is very much appreciated! I just thought this might be generally helpful advice for contributing to OSS projects. I will still try to get around reviewing this soon.

@BioTurboNick
Copy link
Contributor Author

@simeonschaub, sure, I understand. I can break some of this up into smaller bits.

@JeffBezanson
Copy link
Member

Can this be closed in favor of the new series of PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants