Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScopedValue is allocating when accessed #53584

Open
sgaure opened this issue Mar 4, 2024 · 4 comments
Open

ScopedValue is allocating when accessed #53584

sgaure opened this issue Mar 4, 2024 · 4 comments
Labels
performance Must go faster

Comments

@sgaure
Copy link

sgaure commented Mar 4, 2024

I have looked into using scoped values for some temporary arrays to avoid allocations in parallel tasks. However, it seems scoped values are allocating when accessed, whereas with tls it can be avoided. This is unfortunate, since gc in parallel tasks can be a performance problem.

using .Threads
using BenchmarkTools

@noinline function tlsfun()
    tlsvec = get!(() -> [0], task_local_storage(), :myvec)::Vector{Int}
    tlsvec[1] += 1
    return nothing
end


const dynvec = ScopedValue([0])

@noinline function dynfun()
    dvec = dynvec[]
    dvec[1] += 1
    return nothing
end


function tlsrun()
    @sync for _ in 1:nthreads()
        @spawn for _ in 1:100000; tlsfun(); end
    end
end


function dynrun()
    @sync for _ in 1:nthreads()
        @with dynvec=>[0] @spawn for _ in 1:100000; dynfun(); end
    end
end


@btime tlsrun()
@btime dynrun()
versioninfo()

output:

  2.326 ms (202 allocations: 21.03 KiB)
  8.238 ms (2400274 allocations: 36.64 MiB)

Julia Version 1.12.0-DEV.121
Commit bc2212cc0e* (2024-03-04 01:20 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen Threadripper PRO 5945WX 12-Cores
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)
Environment:
  JULIA_EDITOR = emacs -nw

@oscardssmith oscardssmith added performance Must go faster multithreading Base.Threads and related functionality labels Mar 4, 2024
@topolarity
Copy link
Member

This is probably caused by 5b2fcb6 and is not multi-threading related:

julia> const dynvec = ScopedValue([0])
julia> @noinline function dynfun()
           dvec = dynvec[]
           dvec[1] += 1
           return nothing
       end
julia> foo() = @with dynvec=>[0] for _ in 1:1000_000; dynfun(); end

julia> @allocated foo()
16000208

The problem is this @noinline which forces a wrapping Tuple{Vector{Int}} to be allocated as a temporary, even though it is immediately unwrapped at every call-site:

@noinline function KeyValue.get(dict::PersistentDict{K, V}, key) where {K, V}

@topolarity topolarity removed the multithreading Base.Threads and related functionality label Mar 5, 2024
@vtjnash
Copy link
Member

vtjnash commented Mar 5, 2024

So is the problem there that the API wrongly returns an object of type (leaf.val,) instead of Some{V}(leaf.val)?

@topolarity
Copy link
Member

Would Some{V}(leaf.val) bypass the need to allocate the temporary here?

@vtjnash
Copy link
Member

vtjnash commented Mar 5, 2024

I guess not. Apparently we do not have calling convention support for Union{Struct, Ghost}, even though we very easily could (we have many variations on it already) and probably should (it is the iteration protocol)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants