Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macro API #53

Merged
merged 27 commits into from
Mar 1, 2024
Merged

macro API #53

merged 27 commits into from
Mar 1, 2024

Conversation

carstenbauer
Copy link
Member

@carstenbauer carstenbauer commented Feb 21, 2024

Basic attempt at providing a @threaded [kwargs] for ... API. The idea is to just transform the code into tforeach or tmapreduce calls. (Note, renamed to @tasks for ..., see below.)

Basic examples:

julia> @threaded for i in 1:3
           println(i)
       end
2
1
3

julia> @threaded scheduler=StaticScheduler() for i in 1:3
           println(i)
       end
1
3
2

julia> @threaded reducer=(+) for i in 1:3
           sin(i)
       end
1.8918884196934453

I also took a stab at adding a @tasklocal macro that can be used within the loop-body and essentially expands to the common TaskLocalValue pattern.

Example @tasklocal:

(with 8 threads)

julia> @threaded scheduler=StaticScheduler() for j in 1:16
           @tasklocal C::Matrix{Int64} = fill(Threads.threadid(), 2,2)
           println(taskid(), " -> ", C, " (", pointer_from_objref(C),")")
       end
17449308883177734487 -> [11 11; 11 11] (Ptr{Nothing} @0x00001468adc04730)
7858258358232135215 -> [10 10; 10 10] (Ptr{Nothing} @0x00001468adfba410)
17449308883177734487 -> [11 11; 11 11] (Ptr{Nothing} @0x00001468adc04730)
7858258358232135215 -> [10 10; 10 10] (Ptr{Nothing} @0x00001468adfba410)
14141364989831668232 -> [9 9; 9 9] (Ptr{Nothing} @0x00001468ae528070)
731939243900798756 -> [4 4; 4 4] (Ptr{Nothing} @0x00001468b41d9630)
12158275713140852179 -> [5 5; 5 5] (Ptr{Nothing} @0x00001468ae634070)
731939243900798756 -> [4 4; 4 4] (Ptr{Nothing} @0x00001468b41d9630)
12158275713140852179 -> [5 5; 5 5] (Ptr{Nothing} @0x00001468ae634070)
14141364989831668232 -> [9 9; 9 9] (Ptr{Nothing} @0x00001468ae528070)
9445684018239188219 -> [6 6; 6 6] (Ptr{Nothing} @0x0000146b1052c190)
9445684018239188219 -> [6 6; 6 6] (Ptr{Nothing} @0x0000146b1052c190)
12917150328223514965 -> [7 7; 7 7] (Ptr{Nothing} @0x00001468ae4d5510)
12917150328223514965 -> [7 7; 7 7] (Ptr{Nothing} @0x00001468ae4d5510)
1380909043146770272 -> [8 8; 8 8] (Ptr{Nothing} @0x00001468ae500190)
1380909043146770272 -> [8 8; 8 8] (Ptr{Nothing} @0x00001468ae500190)

This expands to

julia> @macroexpand @threaded scheduler=StaticScheduler() for j in 1:16
           @tasklocal C::Matrix{Int64} = fill(Threads.threadid(), 2,2)
           println(taskid(), " -> ", C, " (", pointer_from_objref(C),")")
       end
quote
    #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:74 =#
    begin
        #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:61 =#
        var"##242" = OhMyThreads.TaskLocalValue{Matrix{Int64}}((()->begin
                        #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:61 =#
                        fill(Threads.threadid(), 2, 2)
                    end))
    end
    #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:75 =#
    OhMyThreads.tforeach(1:16; scheduler = StaticScheduler()) do j
        #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:76 =#
        begin
            #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:64 =#
            C = var"##242"[]
        end
        #= /scratch/pc2-mitarbeiter/bauerc/devel/OhMyThreads.jl/src/macro.jl:77 =#
        begin
            #= REPL[44]:2 =#
            #= REPL[44]:3 =#
            println(taskid(), " -> ", C, " (", pointer_from_objref(C), ")")
            #= REPL[44]:4 =#
        end
    end
end

TODOs

  • Do we want to have something like @tasklocal?
  • Currently, the task local storage (gensym()'ed variable name) is available after the loop. Could put everything into a let block? But the user could be interested in this after loop. Maybe let them choose a name?
  • Support multiple @tasklocal variables. (We now accept a single @tasklocal which can either be a typed assignment or a begin ... end block of typed assignments)
  • Bikeshedding, see below.
  • Overhaul escaping
  • Optional: Channel support

Closes #49

@carstenbauer
Copy link
Member Author

@MasonProtter, would be great if you could take a look. I'm not that well versed in writing macros :)

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 21, 2024

Bikeshedding:

  • @threaded indicates multithreading but doesn't indicate Julia's task-based parallelism.
  • Brainstorming: @taskparallel, @paralleltasks, @threadedtasks, @parallel, @tasks

@MasonProtter
Copy link
Member

One thought I had was trying to copy Chapel's forall syntax. So something like

@forall i in 1:10 begin
    println(i)
end

but it kinda sucks that we have to write the begin. Another thought I had was using a do block, but that's not allowed (ref JuliaLang/JuliaSyntax.jl#414).

It kinda makes me think though we could just do this without a macro at all though:

forall(1:10) do i
    println(i)
end

and

forall(1:10; reducer=(+)) do i
    println(i)
end

but there's various things like the task local storage that might benefit from having a macro available.

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 21, 2024

It kinda makes me think though we could just do this without a macro at all though:

forall(1:10) do i
    println(i)
end

That's already available, just forallforeach?

foreach(1:10) do i
	println(i)
end

but there's various things like the task local storage that might benefit from having a macro available.

This. Also, whether we like it or not, people are (much) more used to writing

for i in 1:10
	# do something
end

than

foreach(1:10) do i
	# do something
end

I always have to explain the do syntax thoroughly in my course whereas the for loop is trivial for everyone.

(Of course, we don't always have to give people what they want 😀)

@carstenbauer
Copy link
Member Author

Added support for multiple task local variables

@threaded scheduler=StaticScheduler() for j in 1:16
    @tasklocal begin
        C::Matrix{Int64} = fill(Threads.threadid(), 2,2)
        D::Matrix{Float64} = rand(3,3)
    end
    sleep(0.01*j)
    println(Threads.threadid(), " (", pointer_from_objref(C),") (", pointer_from_objref(D),")")
end

To avoid "leaking" of the task local storages, we now wrap everything in a let block. For example, the above expands to this

let var"##290" = OhMyThreads.TaskLocalValue{Matrix{Int64}}((()->begin
                      fill(Threads.threadid(), 2, 2)
                  end)), var"##291" = OhMyThreads.TaskLocalValue{Matrix{Float64}}((()->begin
                      rand(3, 3)
                  end))
      begin
          OhMyThreads.tforeach(1:16; scheduler = StaticScheduler()) do j
              begin
                  C = var"##290"[]
                  D = var"##291"[]
              end
              begin
                  sleep(0.01j)
                  println(Threads.threadid(), " (", pointer_from_objref(C), ") (", pointer_from_objref(D), ")")
              end
          end
      end
  end

@carstenbauer
Copy link
Member Author

Note to future self: It needs to be clearly documented that assignments within a @tasklocal block need to be self-contained, e.g. they can't reference each other.

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 23, 2024

Another (wild) thought I just had: I kind of dislike that all the keyword arguments (e.g. scheduler or reducer) are all in a single line and push back the actual iteration. Compare

for i in 1:16

and, e.g.,

@threaded scheduler=DynamicScheduler(; nchunks=8, split=:scatter) reducer=(+) for j in 1:16

I was wondering whether we should instead use a separate block for all settings, e.g.,

@threaded for j in 1:16
    @settings begin
        scheduler=DynamicScheduler(; nchunks=8, split=:scatter)
        reducer=(+)
    end
    println(j)
end

or, equivalently,

@threaded for j in 1:16
    println(j)

    @settings begin
        scheduler=DynamicScheduler(; nchunks=8, split=:scatter)
        reducer=(+)
    end
end

src/macro.jl Outdated Show resolved Hide resolved
src/macro.jl Outdated Show resolved Hide resolved
src/macro.jl Outdated Show resolved Hide resolved
src/macro.jl Outdated Show resolved Hide resolved
@MasonProtter
Copy link
Member

Another (wild) thought I just had: I kind of dislike that all the keyword arguments (e.g. scheduler or reducer) are all in a single line and push back the actual iteration. Compare

for i in 1:16

and, e.g.,

@threaded scheduler=DynamicScheduler(; nchunks=8, split=:scatter) reducer=(+) for j in 1:16

I was wondering whether we should instead use a separate block for all settings, e.g.,

@threaded for j in 1:16
    @settings begin
        scheduler=DynamicScheduler(; nchunks=8, split=:scatter)
        reducer=(+)
    end
    println(j)
end

or, equivalently,

@threaded for j in 1:16
    println(j)

    @settings begin
        scheduler=DynamicScheduler(; nchunks=8, split=:scatter)
        reducer=(+)
    end
end

I like this idea a lot!

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 28, 2024

Some bike shedding updates. I think I dislike @threaded because 1) it is very similar to @threads and 2) it is focused on threads not tasks. My current favourites are

  • @tasks (I like that it's simple and puts a clear emphasis on tasks.)
  • @parallel
  • @omt (for OhMyThreads)

I also don't think that @tasklocal is the best choice because regular variables defined in the parallel for loop are also local to the task. The crux really is the "once per task" nature. My current favourites are

  • @init (It's simple and gets the point across, I think, but maybe too general?)
  • @tlv (for TaskLocalValue)

For now, I'll go with / change to @tasks and @init.

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 28, 2024

These all work now:

@tasks for i in 1:10
    @set scheduler=StaticScheduler()
    println(i)
end

@tasks for i in 1:10
    @set begin
        scheduler=DynamicScheduler(; nchunks=4)
        reducer= (a,b) -> a+b
    end
    sin(i)
end

@tasks for i in 1:10
    @set scheduler=DynamicScheduler(; nchunks=4)
    @set reducer=(+)
    sin(i)
end

@tasks for i in 1:10
    @init begin
        C::Matrix{Int64} = fill(Threads.threadid(), 2,2)
        D::Matrix{Float64} = rand(3,3)
    end
    println("asd")
end

@tasks scheduler=DynamicScheduler(; nchunks=4) for i in 1:10
    @set scheduler=StaticScheduler() # takes precedence
    println(i)
end

@carstenbauer carstenbauer changed the title @threaded macro macro API Feb 28, 2024
@MasonProtter
Copy link
Member

MasonProtter commented Feb 28, 2024

One thought I had was when we construct the inner function to be passed to tmapreduce or tforeach, it'd be cool if we could make a way so that that function doesn't need to access the task_local_storage during the sequential part of their loop, saving some precious nano-seconds if it's a tight loop like just summing up Float64s or something.

I think the way this would have to work though would be to have something like

struct WithTaskLocalValue{F, TLVs <: Tuple{Vararg{TaskLocalValue}}}
    func::F
    tlvs::TLVs
end
initialize(f::WithTaskLocalValue) = f.func(map(x -> x[], f.tlvs)...)
initialize(f::Any) = f

and then

# DynamicScheduler: AbstractArray/Generic
function _tmapreduce(f,
        op,
        Arrs,
        ::Type{OutputType},
        scheduler::DynamicScheduler,
        mapreduce_kwargs)::OutputType where {OutputType}
    (; nchunks, split, threadpool) = scheduler
    check_all_have_same_indices(Arrs)
    if chunking_enabled(scheduler)
        tasks = map(chunks(first(Arrs); n = nchunks, split)) do inds
            args = map(A -> view(A, inds), Arrs)
            @spawn threadpool mapreduce(initialize(f), op, args...; $mapreduce_kwargs...) # <------- Change here
        end
        mapreduce(fetch, op, tasks)
    else
        tasks = map(eachindex(first(Arrs))) do i
            args = map(A -> @inbounds(A[i]), Arrs)
            @spawn threadpool initialize(f)(args...) # <------- Change here
        end
        mapreduce(fetch, op, tasks; mapreduce_kwargs...)
    end
end

This way, someone can do e.g.

function matmulsums_tls(As, Bs)
    N = size(first(As), 1)
    tls = TaskLocalValue{Matrix{Float64}}(() -> Matrix{Float64}(undef, N, N))
    f = WithTaskLocalValue(tls) do C
        function (A, B)
            mul!(C, A, B)
            sum(C)
        end
    end
    tmap(f, As, Bs)
end

This is kinda ugly, but the idea is that the macro could then automatically handle the creation of the WithTaskLocalValue object for the user.

I can try to put this together in a followup PR to this one, or we can try to integrate it here from the get-go.

@MasonProtter
Copy link
Member

@tasks is a good name, let's just do that. Maybe this package should have been named "OhMyTasks" 😆 oh well

@carstenbauer
Copy link
Member Author

This now also works:

@tasks for i in 1:10
    @set collect=true
    i
end

It translates to tmap and thus result == collect(1:10).

(BTW, I think I love the @set idea.)

@carstenbauer
Copy link
Member Author

I can try to put this together in a followup PR to this one, or we can try to integrate it here from the get-go.

Let's do it in a separate PR. This one is anyway growing by the minute :D

@carstenbauer carstenbauer marked this pull request as ready for review February 29, 2024 13:29
@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 29, 2024

I think this is pretty much ready. Doc previews is here.

@carstenbauer
Copy link
Member Author

carstenbauer commented Feb 29, 2024

TODO: update README done

carstenbauer and others added 7 commits February 29, 2024 19:28
* move macro body to the internals module

* only accept one argument, error if the user gives a complex loop assignemnt

* remove double quoting, reformat comment

* remove outdated code

* re-org
@MasonProtter
Copy link
Member

So we'll save the channel support for a future PR?

@MasonProtter MasonProtter merged commit 6949ffe into master Mar 1, 2024
10 checks passed
@carstenbauer carstenbauer deleted the cb/macro branch March 4, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For-loop macro
2 participants