Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findall has strange behavior for impure predicates on Bool Arrays #46425

Closed
LilithHafner opened this issue Aug 20, 2022 · 5 comments · Fixed by #51039
Closed

findall has strange behavior for impure predicates on Bool Arrays #46425

LilithHafner opened this issue Aug 20, 2022 · 5 comments · Fixed by #51039
Labels
backport 1.9 Change should be backported to release-1.9 bug Indicates an unexpected problem or unintended behavior correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing regression Regression in behavior compared to a previous version search & find The find* family of functions
Milestone

Comments

@LilithHafner
Copy link
Member

julia> f(x) = x || rand(Bool);

julia> a = [false, false, true];

julia> Random.seed!(10);

julia> findall(f, a)
2-element Vector{Int64}:
 1
 2
   # Where is three?
julia> findall(f, a)
3-element Vector{Int64}:
 2
 3 # There it is!
 4 # Oh, hello four, I didn't expect to see you here.

julia> versioninfo() # 25 days ago
Julia Version 1.9.0-DEV.1053
Commit 9e22e567f29 (2022-07-26 14:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.5.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0

Not returning 3 is bad. Returing 4 could lead to segfaults and is definitely a bug.

We have two implementations of findall(f::Function, a::AbstractArray)

  1. Precompute f.(a) and then get indices from the BitArray
    findall(testf::Function, A::AbstractArray) = findall(testf.(A))
  2. Count count(f, a), preallocate an index array, and iterate through a second time recomputing f(a)

    julia/base/array.jl

    Lines 2375 to 2393 in aac466f

    function _findall(f::Function, A::AbstractArray{Bool})
    n = count(f, A)
    I = Vector{eltype(keys(A))}(undef, n)
    isempty(I) && return I
    _findall(f, I, A)
    end
    function _findall(f::Function, I::Vector, A::AbstractArray{Bool})
    cnt = 1
    len = length(I)
    for (k, v) in pairs(A)
    @inbounds I[cnt] = k
    cnt += f(v)
    cnt > len && return I
    end
    # In case of impure f, this line could potentially be hit. In that case,
    # we can't assume I is the correct length.
    resize!(I, cnt - 1)
    end

For simple f, 2 is about 1.5x faster according to my rough benchmarks. If the runtime of f dominates, then 1 should be 2x faster. If f is impure then 1 behaves how one would expect and 2 can have bizarre consequences.

Ideally, we dispatch to 2 for simple pure f and 1 otherwise. Our current heuristic is to dispatch to 2 for a::AbstractArray{Bool} and 1 otherwise. This is a bad heuristic. It would be cool to dispatch based on effect analysis, but if that is not an option, my preference is to use f === identity as the heuristic (even though this is a performance hit in some cases).

I suspect this was introduced by #42202 (cc @jakobnissen) which fixed #42187.

@LilithHafner LilithHafner added bug Indicates an unexpected problem or unintended behavior search & find The find* family of functions labels Aug 20, 2022
@adienes
Copy link
Contributor

adienes commented May 14, 2023

potential candidate to tag for 1.10?

@LilithHafner
Copy link
Member Author

Oops! This is broken on 1.9 and not on 1.8, so we probably should have put it on the 1.9 milestone and held back the release until it was fixed. Thanks for the bump.

@LilithHafner LilithHafner added the regression Regression in behavior compared to a previous version label May 14, 2023
@LilithHafner LilithHafner added this to the 1.10 milestone May 14, 2023
@LilithHafner LilithHafner added the backport 1.9 Change should be backported to release-1.9 label May 14, 2023
@gbaraldi gbaraldi modified the milestones: 1.10, 1.9 May 15, 2023
jakobnissen added a commit to jakobnissen/julia that referenced this issue May 16, 2023
In commit 4c4c94f, findall(f, A::AbstractArray{Bool}) was optimised by using a
technique where A was traversed twice: Once to count the number of true elements
and once to fill in the resulting vector.
However, this could cause problems for arbitrary functions f: For slow f, the
approach is ~2x slower. For impure f, f being called twice could cause side
effects and strange issues (see issue JuliaLang#46425)

With this commit, the optimised version is only dispatched to when f is ! or
xor.
jakobnissen added a commit to jakobnissen/julia that referenced this issue May 16, 2023
In commit 4c4c94f, findall(f, A::AbstractArray{Bool}) was optimised by using a
technique where A was traversed twice: Once to count the number of true elements
and once to fill in the resulting vector.
However, this could cause problems for arbitrary functions f: For slow f, the
approach is ~2x slower. For impure f, f being called twice could cause side
effects and strange issues (see issue JuliaLang#46425)

With this commit, the optimised version is only dispatched to when f is ! or
identity.
@JeffBezanson JeffBezanson modified the milestones: 1.9, 1.10 Aug 2, 2023
@LilithHafner LilithHafner added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023
@JeffBezanson
Copy link
Member

#42202 should be reverted since it has an invalid use of @inbounds. I think we are allowed to assume the passed function is pure, but not in a way that does invalid accesses. For example, if the passed function prints something, we say "the number of times it prints is implementation-defined" (but not undefined behavior).

@gbaraldi
Copy link
Member

To be clear, pure here means consistent. If the function throws then oh well, we may or may not throw.

KristofferC pushed a commit that referenced this issue Aug 25, 2023
This reverts commit 4c4c94f. (Except
keeps the tests and adds a new one)

fixes #46425
@Keno
Copy link
Member

Keno commented Aug 29, 2023

Note that querying the compiler for effects is permissible to deal with these sorts of cases if the performance is important.

KristofferC pushed a commit that referenced this issue Sep 15, 2023
This reverts commit 4c4c94f. (Except
keeps the tests and adds a new one)

fixes #46425

(cherry picked from commit defe187)
nalimilan pushed a commit that referenced this issue Nov 5, 2023
This reverts commit 4c4c94f. (Except
keeps the tests and adds a new one)

fixes #46425

(cherry picked from commit defe187)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.9 Change should be backported to release-1.9 bug Indicates an unexpected problem or unintended behavior correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing regression Regression in behavior compared to a previous version search & find The find* family of functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants