Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Noncontiguous APIs #365

Closed
jdinan opened this issue Feb 7, 2020 · 8 comments
Closed

Improve Noncontiguous APIs #365

jdinan opened this issue Feb 7, 2020 · 8 comments
Assignees

Comments

@jdinan
Copy link
Collaborator

jdinan commented Feb 7, 2020

Issue

The current interleaved communication routines in OpenSHMEM (shmem_iput/iget) transfer single element chunks that are a fixed stride apart (source and destination can have different strides). This API does not capture many noncontiguous data transfer patterns. For example, it is inefficient for applications that transfer array sections on two and higher dimensionality arrays.

Possible Solutions

Block Interleaved API

Extend the existing SHMEM interleaved APIs (e.g. shmem_iput) to include a block size. This will allow them to support 2d array slice transfers.

void shmem_ibput(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst,
                 size_t blocksize, size_t nelems, int pe);
void shmem_iputmem(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst,
                   size_t element_size, size_t nelems, int pe);

Strided APIs

Something similar to the ARMCI strided APIs could be used. This supports generic matrix slice transfers.

int ARMCI_PutS(void *src_ptr, int src_stride_ar[/*stride_levels*/],
               void *dst_ptr, int dst_stride_ar[/*stride_levels*/], 
               int count[/*stride_levels+1*/], int stride_levels, int proc);

Subarray APIs

Similar to the MPI subarray datatype. This supports generic matrix slice transfers.

void shmem_subarray_put(
        shmem_ctx_t ctx,
        TYPE *dest, size_t dest_ndim, size_t dest_dims[],
        size_t dest_start[], size_t dest_count[],
        TYPE *src, size_t src_ndim, size_t src_dims[],
        size_t src_start[], size_t src_count[],
        int pe);

The user specifies the full dimensions of the source and destination matrices and indicates the pointer to the zero'th element. The upper left and lower right indices of the source and destination slices are given to indicate the source/dest buffer.

This API has the advantage of being very easy for users to use (versus strided APIs, which require thinking about the linearization of the matrix). However, because data can be reshaped during the transfer, it also requires more work on the part of implementations.

Datatype API

Similar to MPI datatypes API. Introduce API for datatype creation and put/get APIs that take source and destination datatypes. An additional API could be used to inform the target about the datatype ahead of time:

shmem_dtype_commit(shmem_dtype_t type, shmem_team_t team, shmem_dtype_hints_t hints);
@jdinan jdinan added this to the OpenSHMEM 1.6 milestone Feb 7, 2020
@naveen-rn
Copy link
Contributor

Related:
Extending Strided Communication Interfaces in OpenSHMEM
Towards Matrix Oriented Strides in OpenSHMEM

@jdinan
Copy link
Collaborator Author

jdinan commented Jul 16, 2020

From 7/2/2020 meeting, WG prefers the block interleaved API. Would like to see strong drivers for strided APIs.

@jdinan jdinan changed the title Improve Strided API Improve Noncontiguous APIs Aug 27, 2020
@jeffhammond
Copy link

jeffhammond commented Sep 14, 2020

Regarding the more general strided APIs...

From https://github.com/jeffhammond/oshmpi/blob/master/docs/oug2014_resubmission-acm_4.pdf:

It is worth asking whether it is worthwhile to generalize the APUT operation for dimensions higher than two to support tensor operations (for some applications, see [7] and [15]). There are two arguments against this. First, operations on subarrays of dimension greater than two can be expressed in terms of a single APUT operation by combining the strides; for example, a three-dimensional subarray operation can be cast in terms of a two-dimension subarray computation if the stride over x and y are multiplied together (here we assume z is the contiguous dimension that is captured by blockelems). Regardless of the number of dimensions associated with the strides, the key efficiency gain with APUT is accomplished by operating on blocks of contiguous data rather than single elements, as is the case for IPUT. Second, the myriad of applications involving tensor operations include many cases where cartesian subarrays are not useful. For example, in the domain of quantum chemistry, most tensors have permutation (anti-)symmetry and thus cannot make use of operations designed for non-symmetric subarrays. Such is the complexity of tensor data in the NWChem [3] Tensor Contraction Engine [6] that block-sparse and permutation- (anti)symmetric tensors are mapped to one-dimensional global arrays with an application-defined hashing scheme.

@jeffhammond
Copy link

If you are going to add 2D array support, you might want to think about collectives as well.

@jdinan
Copy link
Collaborator Author

jdinan commented Oct 5, 2020

Discussion at RMA WG today:

Interest in pursuing the datatypes, API. However, we would need a driver.

Possible drivers for noncontig APIs:

  • Stencil codes, e.g. Jacobi
  • NWChem
    • May not use multi-sided datatype setup, up to 8-d arrays
  • Fortran Coarrays
    • Must not require collective datatype setup

@jdinan
Copy link
Collaborator Author

jdinan commented Oct 5, 2020

@jeffhammond I don't understand the argument for dimensions higher than two using APUT. Are you calling it in a loop over the outer dimensions?

@jeffhammond
Copy link

I'm saying 2D is sufficient for cartesian arrays. 3D can be collapsed to 2D by multiplying the first two strides. And so forth. Or one can loop over 2D ops if somehow that doesn't work. The loop overhead isn't going to matter because a 2D operation is going to be relatively expensive.

@jdinan jdinan removed this from the OpenSHMEM 1.6 milestone Oct 12, 2023
@jdinan
Copy link
Collaborator Author

jdinan commented Nov 7, 2024

Closed by #448

@jdinan jdinan closed this as completed Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants