Improve Noncontiguous APIs #365

jdinan · 2020-02-07T16:11:34Z

Issue

The current interleaved communication routines in OpenSHMEM (shmem_iput/iget) transfer single element chunks that are a fixed stride apart (source and destination can have different strides). This API does not capture many noncontiguous data transfer patterns. For example, it is inefficient for applications that transfer array sections on two and higher dimensionality arrays.

Possible Solutions

Block Interleaved API

Extend the existing SHMEM interleaved APIs (e.g. shmem_iput) to include a block size. This will allow them to support 2d array slice transfers.

void shmem_ibput(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst,
                 size_t blocksize, size_t nelems, int pe);

void shmem_iputmem(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst,
                   size_t element_size, size_t nelems, int pe);

Strided APIs

Something similar to the ARMCI strided APIs could be used. This supports generic matrix slice transfers.

int ARMCI_PutS(void *src_ptr, int src_stride_ar[/*stride_levels*/],
               void *dst_ptr, int dst_stride_ar[/*stride_levels*/], 
               int count[/*stride_levels+1*/], int stride_levels, int proc);

Subarray APIs

Similar to the MPI subarray datatype. This supports generic matrix slice transfers.

void shmem_subarray_put(
        shmem_ctx_t ctx,
        TYPE *dest, size_t dest_ndim, size_t dest_dims[],
        size_t dest_start[], size_t dest_count[],
        TYPE *src, size_t src_ndim, size_t src_dims[],
        size_t src_start[], size_t src_count[],
        int pe);

The user specifies the full dimensions of the source and destination matrices and indicates the pointer to the zero'th element. The upper left and lower right indices of the source and destination slices are given to indicate the source/dest buffer.

This API has the advantage of being very easy for users to use (versus strided APIs, which require thinking about the linearization of the matrix). However, because data can be reshaped during the transfer, it also requires more work on the part of implementations.

Datatype API

Similar to MPI datatypes API. Introduce API for datatype creation and put/get APIs that take source and destination datatypes. An additional API could be used to inform the target about the datatype ahead of time:

shmem_dtype_commit(shmem_dtype_t type, shmem_team_t team, shmem_dtype_hints_t hints);

The text was updated successfully, but these errors were encountered:

naveen-rn · 2020-02-07T16:40:23Z

Related:
Extending Strided Communication Interfaces in OpenSHMEM
Towards Matrix Oriented Strides in OpenSHMEM

jdinan · 2020-07-16T16:09:02Z

From 7/2/2020 meeting, WG prefers the block interleaved API. Would like to see strong drivers for strided APIs.

jeffhammond · 2020-09-14T15:10:00Z

Regarding the more general strided APIs...

From https://github.com/jeffhammond/oshmpi/blob/master/docs/oug2014_resubmission-acm_4.pdf:

It is worth asking whether it is worthwhile to generalize the APUT operation for dimensions higher than two to support tensor operations (for some applications, see [7] and [15]). There are two arguments against this. First, operations on subarrays of dimension greater than two can be expressed in terms of a single APUT operation by combining the strides; for example, a three-dimensional subarray operation can be cast in terms of a two-dimension subarray computation if the stride over x and y are multiplied together (here we assume z is the contiguous dimension that is captured by blockelems). Regardless of the number of dimensions associated with the strides, the key efficiency gain with APUT is accomplished by operating on blocks of contiguous data rather than single elements, as is the case for IPUT. Second, the myriad of applications involving tensor operations include many cases where cartesian subarrays are not useful. For example, in the domain of quantum chemistry, most tensors have permutation (anti-)symmetry and thus cannot make use of operations designed for non-symmetric subarrays. Such is the complexity of tensor data in the NWChem [3] Tensor Contraction Engine [6] that block-sparse and permutation- (anti)symmetric tensors are mapped to one-dimensional global arrays with an application-defined hashing scheme.

jeffhammond · 2020-09-14T15:16:40Z

If you are going to add 2D array support, you might want to think about collectives as well.

jdinan · 2020-10-05T16:56:36Z

Discussion at RMA WG today:

Interest in pursuing the datatypes, API. However, we would need a driver.

Possible drivers for noncontig APIs:

Stencil codes, e.g. Jacobi
NWChem
- May not use multi-sided datatype setup, up to 8-d arrays
Fortran Coarrays
- Must not require collective datatype setup

jdinan · 2020-10-05T17:16:47Z

@jeffhammond I don't understand the argument for dimensions higher than two using APUT. Are you calling it in a loop over the outer dimensions?

jeffhammond · 2022-11-15T08:12:40Z

I'm saying 2D is sufficient for cartesian arrays. 3D can be collapsed to 2D by multiplying the first two strides. And so forth. Or one can loop over 2D ops if somehow that doesn't work. The loop overhead isn't going to matter because a 2D operation is going to be relatively expensive.

jdinan · 2024-11-07T19:23:49Z

Closed by #448

jdinan added this to the OpenSHMEM 1.6 milestone Feb 7, 2020

manjugv assigned jdinan Jun 8, 2020

jdinan mentioned this issue Jul 23, 2020

Add Block-Strided RMA Operations #448

Merged

jdinan changed the title ~~Improve Strided API~~ Improve Noncontiguous APIs Aug 27, 2020

jdinan removed this from the OpenSHMEM 1.6 milestone Oct 12, 2023

jdinan closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Noncontiguous APIs #365

Improve Noncontiguous APIs #365

jdinan commented Feb 7, 2020 •

edited

Loading

naveen-rn commented Feb 7, 2020

jdinan commented Jul 16, 2020 •

edited

Loading

jeffhammond commented Sep 14, 2020 •

edited

Loading

jeffhammond commented Sep 14, 2020

jdinan commented Oct 5, 2020

jdinan commented Oct 5, 2020

jeffhammond commented Nov 15, 2022

jdinan commented Nov 7, 2024

Improve Noncontiguous APIs #365

Improve Noncontiguous APIs #365

Comments

jdinan commented Feb 7, 2020 • edited Loading

Issue

Possible Solutions

Block Interleaved API

Strided APIs

Subarray APIs

Datatype API

naveen-rn commented Feb 7, 2020

jdinan commented Jul 16, 2020 • edited Loading

jeffhammond commented Sep 14, 2020 • edited Loading

jeffhammond commented Sep 14, 2020

jdinan commented Oct 5, 2020

jdinan commented Oct 5, 2020

jeffhammond commented Nov 15, 2022

jdinan commented Nov 7, 2024

jdinan commented Feb 7, 2020 •

edited

Loading

jdinan commented Jul 16, 2020 •

edited

Loading

jeffhammond commented Sep 14, 2020 •

edited

Loading