Restructure to allow direct specification of array order #2763

eisenhauer · 2021-06-12T15:21:06Z

@germasch Please look to see if this would reference the concerns you raise about array ordering in #2759.
@pnorbert, @JasonRuonanWang, et al. I think we have a notorious shortage of cross-language tests, so perhaps this needs extra attention before merge. The basic approach is minimalistic. There's a new parameter to DeclareIO that has an optional value specifying the array element ordering. The default is Auto, meaning that it's determined by the language specified when the ADIOS object is created, but it can also be RowMajor or ColumnMajor. This does not change the format of BP3/4 files on disk, where the language specified serves as a stand-in for the array ordering. Instead we store C++ for RowMajor and Fortran for ColumnMajor.

JasonRuonanWang

Looks good to me as long as it passes all the tests. For cross-language tests, I think we are good for now. At least I have been often blocked by the Fortran tests when I wanted to merge something, which is saying the Fortran tests are pretty effective. I don't mind if you want to add new tests though :)

eisenhauer · 2021-06-13T01:47:30Z

This PR lacks: Python bindings, and tests of its actual functionality.

germasch · 2021-06-13T02:36:06Z

This looks like great step forward, thanks @eisenhauer. Let me mention @rmchurch since he also independently ran in to related issues a little while ago.

For reference, there's this issue with more of my thoughts back in the day: #1661

For the use cases, I have, this PR would do all that's needed -- the case I have are Fortran-order arrays that I'm using in C++. I also think it's not easily possible to have more fine-grained control in the existing BP3/BP4 formats, and so I think it's fine to have a one ordering per file limitation there. (It might be in theory possible to get finger control by (ab)using progress groups, but that doesn't seem worth it.)

In general, though, and that's why I mentioned it for BP5, I think it'd be useful to have the capability to specify the ordering for each dataset (ie., definitely per-variable, maybe even per-variable per-step). I don't think it's urgent to implement a corresponding API for it right now -- but if it's something that's not supported by the file format, it won't be possible to add it later without compatibility issues.

Here are some examples why I find it's useful:

Kokkos, e.g., supports both layouts, on a per-dataset basis. It's certainly conceivable that both get used within the same code, e.g., when part of the data is kept in Fortran layout during a port to C++ effort, while other parts may be completely migrated and use standard C++ layout. Even if that weren't the case, presumably if someone implements a shim layer to write Kokkos arrays to ADIOS2, they'd know what layout to use at the time the dataset is written, not yet when the file is opened.
Better python support. Numpy arrays can be either layout, and it's transparent to the user unless one looks (using <array>.flags. It's quite easily possible to have both layouts used under the hood, in particular in data analysis, where the data probably/hopefully corresponds to the layout on disk. E.g., if one uses h5py to read a dataset written in HDF5 from Fortran, one ends up with the data with the dimensions in zyx order (because HDF5 sadly doesn't support multiple data layouts, either -- last I checked). So usually what one does is take the transpose after reading, which is cheap because it pretty much only flips the ordering flag in numpy. If one now writes this data to adios2, it'll be in Fortran order, and it'd be nice if it could be written with adios2.

In terms of API, I think what you did can naturally be extended, and remains backward compatible:

For an IO, if the ordering is not specified, fall back to the host language (ie., current behavior)
For a variable, if the order is not specified, fall back to the IO's ordering.

So the only additional piece is a new API function like var.setLayout(..), and a corresponding getLayout(). As that's an addition, it doesn't break any existing usage, and there may not even be too much urgency implementing it. The only drawback is that if the file is actually BP3/BP4, and the IO is opened as, e.g., row-major, if one were to try to var.setLayout(columnMajor), it can't be done. I think there are two options on how to handle it: (1) throw an exception or (2) transpose the dims (my preference being the former, as (2) leads to surprises when reading back data).

guj

Thanks Greg for the big effort!

source/adios2/engine/bp3/BP3Writer.cpp

eisenhauer · 2021-08-06T14:30:46Z

I'm going to go ahead and merge this, as it passes all existing tests.

eisenhauer requested review from pnorbert and JasonRuonanWang June 12, 2021 15:21

JasonRuonanWang reviewed Jun 12, 2021

View reviewed changes

eisenhauer requested a review from guj June 12, 2021 17:08

guj reviewed Jun 14, 2021

View reviewed changes

source/adios2/engine/bp3/BP3Writer.cpp Show resolved Hide resolved

eisenhauer force-pushed the ArrayOrder branch from 255e33b to 3de1eec Compare June 14, 2021 23:58

Restructure to allow direct specification of array order

d0f6cd5

eisenhauer force-pushed the ArrayOrder branch from f36a32d to d0f6cd5 Compare August 6, 2021 12:13

eisenhauer merged commit 47b7df5 into ornladios:master Aug 6, 2021

eisenhauer deleted the ArrayOrder branch August 6, 2021 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure to allow direct specification of array order #2763

Restructure to allow direct specification of array order #2763

eisenhauer commented Jun 12, 2021

JasonRuonanWang left a comment

eisenhauer commented Jun 13, 2021

germasch commented Jun 13, 2021

guj left a comment

eisenhauer commented Aug 6, 2021

Restructure to allow direct specification of array order #2763

Restructure to allow direct specification of array order #2763

Conversation

eisenhauer commented Jun 12, 2021

JasonRuonanWang left a comment

Choose a reason for hiding this comment

eisenhauer commented Jun 13, 2021

germasch commented Jun 13, 2021

guj left a comment

Choose a reason for hiding this comment

eisenhauer commented Aug 6, 2021