Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure to allow direct specification of array order #2763

Merged
merged 1 commit into from
Aug 6, 2021

Conversation

eisenhauer
Copy link
Member

@germasch Please look to see if this would reference the concerns you raise about array ordering in #2759.
@pnorbert, @JasonRuonanWang, et al. I think we have a notorious shortage of cross-language tests, so perhaps this needs extra attention before merge. The basic approach is minimalistic. There's a new parameter to DeclareIO that has an optional value specifying the array element ordering. The default is Auto, meaning that it's determined by the language specified when the ADIOS object is created, but it can also be RowMajor or ColumnMajor. This does not change the format of BP3/4 files on disk, where the language specified serves as a stand-in for the array ordering. Instead we store C++ for RowMajor and Fortran for ColumnMajor.

Copy link
Member

@JasonRuonanWang JasonRuonanWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as long as it passes all the tests. For cross-language tests, I think we are good for now. At least I have been often blocked by the Fortran tests when I wanted to merge something, which is saying the Fortran tests are pretty effective. I don't mind if you want to add new tests though :)

@eisenhauer eisenhauer requested a review from guj June 12, 2021 17:08
@eisenhauer
Copy link
Member Author

This PR lacks: Python bindings, and tests of its actual functionality.

@germasch
Copy link
Contributor

This looks like great step forward, thanks @eisenhauer. Let me mention @rmchurch since he also independently ran in to related issues a little while ago.

For reference, there's this issue with more of my thoughts back in the day: #1661

For the use cases, I have, this PR would do all that's needed -- the case I have are Fortran-order arrays that I'm using in C++. I also think it's not easily possible to have more fine-grained control in the existing BP3/BP4 formats, and so I think it's fine to have a one ordering per file limitation there. (It might be in theory possible to get finger control by (ab)using progress groups, but that doesn't seem worth it.)

In general, though, and that's why I mentioned it for BP5, I think it'd be useful to have the capability to specify the ordering for each dataset (ie., definitely per-variable, maybe even per-variable per-step). I don't think it's urgent to implement a corresponding API for it right now -- but if it's something that's not supported by the file format, it won't be possible to add it later without compatibility issues.

Here are some examples why I find it's useful:

  • Kokkos, e.g., supports both layouts, on a per-dataset basis. It's certainly conceivable that both get used within the same code, e.g., when part of the data is kept in Fortran layout during a port to C++ effort, while other parts may be completely migrated and use standard C++ layout. Even if that weren't the case, presumably if someone implements a shim layer to write Kokkos arrays to ADIOS2, they'd know what layout to use at the time the dataset is written, not yet when the file is opened.
  • Better python support. Numpy arrays can be either layout, and it's transparent to the user unless one looks (using <array>.flags. It's quite easily possible to have both layouts used under the hood, in particular in data analysis, where the data probably/hopefully corresponds to the layout on disk. E.g., if one uses h5py to read a dataset written in HDF5 from Fortran, one ends up with the data with the dimensions in zyx order (because HDF5 sadly doesn't support multiple data layouts, either -- last I checked). So usually what one does is take the transpose after reading, which is cheap because it pretty much only flips the ordering flag in numpy. If one now writes this data to adios2, it'll be in Fortran order, and it'd be nice if it could be written with adios2.

In terms of API, I think what you did can naturally be extended, and remains backward compatible:

  • For an IO, if the ordering is not specified, fall back to the host language (ie., current behavior)
  • For a variable, if the order is not specified, fall back to the IO's ordering.

So the only additional piece is a new API function like var.setLayout(..), and a corresponding getLayout(). As that's an addition, it doesn't break any existing usage, and there may not even be too much urgency implementing it. The only drawback is that if the file is actually BP3/BP4, and the IO is opened as, e.g., row-major, if one were to try to var.setLayout(columnMajor), it can't be done. I think there are two options on how to handle it: (1) throw an exception or (2) transpose the dims (my preference being the former, as (2) leads to surprises when reading back data).

Copy link
Contributor

@guj guj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg for the big effort!

source/adios2/engine/bp3/BP3Writer.cpp Show resolved Hide resolved
@eisenhauer
Copy link
Member Author

I'm going to go ahead and merge this, as it passes all existing tests.

@eisenhauer eisenhauer merged commit 47b7df5 into ornladios:master Aug 6, 2021
@eisenhauer eisenhauer deleted the ArrayOrder branch August 6, 2021 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants