Dynamic scheduling for parallel_for and parallel_reduce #106

mrtupek · 2015-10-08T18:01:08Z

We've found that some of our loops are faster using dynamic scheduling in openmp.

mhoemmen · 2015-10-09T21:26:02Z

See issue 100:

#100

I'm not sure how to mark issues as duplicates of another issue, or I would do that now :-)

hcedwar · 2015-11-23T18:12:26Z

Dynamic scheduling options for both RangePolicy and TeamPolicy

hcedwar · 2015-11-23T18:15:19Z

Adding optional template parameter to policies to specify dynamic scheduling. Policy templates should be switched to variadic arguments.

crtrott · 2015-11-23T18:16:28Z

How about Kokkos::ScheduleDynamic or Kokkos::Schedule < Kokkos::Dynamic >

nmhamster · 2015-11-23T18:22:41Z

Prefer Kokkos::Schedule < Kokkos::Dynamic >

crtrott · 2015-12-16T03:38:05Z

Ok this is the syntax I am implementing right now (starting with the RangePolicy):

ExecutionPolicy< class ... Traits >

where Traits can be any or none of the following

ExecutionSpace (some Kokkos execution space)
WorkTag (some user defined class)
Kokkos::ScheduleKokkos::Dynamic , Kokkos::ScheduleKokkos::Static
Kokkos::ChunkSize
Kokkos::IterationType<INT_TYPE>

deprecated: INT_TYPE (i.e. some raw integer type as a direct template argument)

So someone could really customize an execution policy like this:

RangePolicy<InitialIntegrate, OpenMP, Schedule, ChunkSize<32>, IterationType >

Right now order of arguments doesn't matter, but only one of each type is accepted (it static asserts with a helpful message if you for example try RangePolicy<OpenMP,Serial>).

I am pretty much done with RangePolicy accepting these arguments, but Schedule doesn't do anything yet. The other ones doe the same thing as the equivalent before.
This lives on the experimental branch.

crtrott · 2015-12-16T04:00:23Z

One more comment: I will strongly advocate for Schedule, and ChunkSize to have only hint character. Otherwise people can do all kinds of crazy things. For example if you are guaranteed a ChunkSize I think it is possible to write a legal Kokkos::parallel_for which is not vectorizable (effectively you know that successive Chunk items will not collide). While that has its advantages, I think the loss in freedom to map iterations to hardware is far worse. So I rather would say those things are hints, and force people to write portable code. Static for example has an issue because on GPUs there is no true static execution since the hardware has a dynamic scheduler build in.

hcedwar · 2015-12-16T15:10:33Z

Use the term "properties" for this kind of template parameters:
RangePolicy< class ... Properties >
The term "traits" is already in the C++11 meta-programming vernacular.

crtrott · 2015-12-20T01:06:43Z

Just noticed I had done that already in the actual implementation on experimental.
I also changed ChunkSize to become a runtime argument, so that one could adapt the chunk size to the input deck. I.e.:

Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic> >(0,N, Kokkos::ChunkSize(32) )
// This uses all options including TagType in order to call operator(MyCustomTag, int)
Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>, 
                                   Kokkos::IterationType<int>,
                                   Kokkos::OpenMP,
                                   MyCustomTag >(0,N, Kokkos::ChunkSize(32) )

crtrott · 2015-12-20T01:14:07Z

Would a better word for IterationType maybe be IndexType. It is the integer type the policy will internally use to provide indicies to the functor. This includes stuff like team_rank, league_rank, league_size team_size etc. when we extend the PolicyTraits to be used for the TeamPolicy.

nmhamster · 2015-12-20T01:19:07Z

I prefer IndexType, seems to me to be more logical.

crtrott · 2015-12-20T01:22:19Z

I actually think the same.

crtrott · 2016-01-14T18:54:38Z

This is now in master using setter functions for runtime parameters.

crtrott added duplicate Feature Request Create new capability; will potentially require voting labels Oct 29, 2015

hcedwar removed the duplicate label Nov 23, 2015

hcedwar mentioned this issue Nov 23, 2015

Dynamic scheduling option for RangePolicy #100

Closed

hcedwar added this to the Pre Christmas Push milestone Nov 23, 2015

crtrott self-assigned this Nov 23, 2015

crtrott mentioned this issue Nov 23, 2015

Dynamic scheduling team execution policy #53

Closed

crtrott closed this as completed Jan 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic scheduling for parallel_for and parallel_reduce #106

Dynamic scheduling for parallel_for and parallel_reduce #106

mrtupek commented Oct 8, 2015

mhoemmen commented Oct 9, 2015

hcedwar commented Nov 23, 2015

hcedwar commented Nov 23, 2015

crtrott commented Nov 23, 2015

nmhamster commented Nov 23, 2015

crtrott commented Dec 16, 2015

crtrott commented Dec 16, 2015

hcedwar commented Dec 16, 2015

crtrott commented Dec 20, 2015

crtrott commented Dec 20, 2015

nmhamster commented Dec 20, 2015

crtrott commented Dec 20, 2015

crtrott commented Jan 14, 2016

Dynamic scheduling for parallel_for and parallel_reduce #106

Dynamic scheduling for parallel_for and parallel_reduce #106

Comments

mrtupek commented Oct 8, 2015

mhoemmen commented Oct 9, 2015

hcedwar commented Nov 23, 2015

hcedwar commented Nov 23, 2015

crtrott commented Nov 23, 2015

nmhamster commented Nov 23, 2015

crtrott commented Dec 16, 2015

crtrott commented Dec 16, 2015

hcedwar commented Dec 16, 2015

crtrott commented Dec 20, 2015

crtrott commented Dec 20, 2015

nmhamster commented Dec 20, 2015

crtrott commented Dec 20, 2015

crtrott commented Jan 14, 2016