Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Random-Access API and Streaming API into Series::snapshots() #1592

Merged
merged 97 commits into from
Dec 10, 2024

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Jan 29, 2024

So far, there were two distinct APIs for accessing Iterations:

  1. random-access via Series::iterations
  2. step-based access via Series::writeIterations() and Series::readIterations()

This PR introduces a new API Series::snapshots() that returns a unified container API. The unified container API can be implemented internally via random-access and via synchronous access, i.e. by using steps.

This is part of an effort to bring both workflows and their functionalities closer together. While this redesigned container API does not yet go all the way along this road, it lays the fundament for such extensibility by (1) cleanly separating the API from its implementation and by (2) redesigning and cleaning up the implementation of the step-based workflow.

Preliminary/experimental support for reopening closed Iterations is now available, especially for file-based workflows.
Note that there are still caveats about reopening Iterations. Especially in ADIOS2, a step cannot be modified once closed, so writing new data is only possible into a new step. However, the Iteration handle currently does not delete old data definitions upon closing, meaning that there will be invalid data definitions in the handle.
In future, a closed Iteration will be removed from internal data structures (but users may keep it around for longer than that in their own data structures).

Future features to be built on top of this include:

  1. Random-access for variable-based encoding based on SetStepSelection() in ADIOS2
  2. Going backwards through the steps of an ADIOS2 file (by closing and reopening)
  3. Dealing better with duplicate Iterations, arising from checkpoint-restart workflows (e.g. pre-scanning and eliminating duplicates beforehand)
  4. Resource management, such as deleting Iteration handles upon closing them, and recreating them upon reopening

writeIterations() and readIterations() are now implemented as aliases/wrappers to Series::snapshots(). Due to the past somewhat unfortunate decision to introduce IndexedIteration as iteration type for readIterations(), the method name cannot be reused without breaking compatibility. writeIterations() however is just a wrapper for snapshots() with default arguments.

TODO:

  • Merge Filename extensions: Allow specifying wildcards #1584 first
  • Better logic for READ_WRITE mode
  • backwards compatibility: using definitions, empty headers
  • documentation, change examples
  • maybe snapshots(Snapshots::RandomAccess) vs snapshots(Snapshots::CollectiveAccess) or so
  • Follow-up: Container API: How to deal with erased Iterations? Answer: Iterations currently only being closed, not erased. Problem for later.
  • follow-up: snapshots().begin() -> reset to start, add sth like snapshots().current(), alternatively keep the current semantics and add snapshots().reset()?
  • Follow-up: Flush open iterations instead of dirty iterations.
  • default: defer_iteration_parsing in READ_LINEAR mode
  • follow-up: Check that the last iteration is closed implicitly upon destruction
  • follow-up: delete closed iterations in Append mode
  • follow-up: re-work access modes, create-linear, append-linear, ..?
  • follow-up: Python bindings

Diff: franzpoeschel/openPMD-api@shared-attributable-data...topic-iterator

@franzpoeschel franzpoeschel marked this pull request as draft January 29, 2024 17:05
src/Series.cpp Fixed Show fixed Hide fixed
@franzpoeschel franzpoeschel force-pushed the topic-iterator branch 2 times, most recently from 1e862b7 to ab0991d Compare February 14, 2024 16:29
Comment on lines 100 to 235
switch (series.iterationEncoding())
{
using IE = IterationEncoding;
case IE::fileBased:
series.readFileBased();
break;
case IE::groupBased:
case IE::variableBased: {
Parameter<Operation::OPEN_FILE> fOpen;
fOpen.name = series.get().m_name;
series.IOHandler()->enqueue(IOTask(&series, fOpen));
series.IOHandler()->flush(internal::defaultFlushParams);
using PP = Parameter<Operation::OPEN_FILE>::ParsePreference;
switch (*fOpen.out_parsePreference)
{
case PP::PerStep:
series.advance(AdvanceMode::BEGINSTEP);
series.readGorVBased(
/* do_always_throw_errors = */ false, /* init = */ true);
break;
case PP::UpFront:
series.readGorVBased(
/* do_always_throw_errors = */ false, /* init = */ true);
/*
* In linear read mode (where no parsing at all is done upon
* constructing the Series), it might turn out after parsing
* that what we expected to be a group-based Series was in fact
* a single file of a file-based Series.
* (E.g. when opening "data00000100.h5" directly instead of
* "data%T.h5")
* So we need to check the current value of
* `iterationEncoding()` once more.
*/
if (series.iterationEncoding() != IterationEncoding::fileBased)
{
series.advance(AdvanceMode::BEGINSTEP);
}
break;
}
data.parsePreference = *fOpen.out_parsePreference;
break;
}
}

Check notice

Code scanning / CodeQL

Long switch case Note

Switch has at least one case that is too long:
variableBased (38 lines)
.
@franzpoeschel franzpoeschel force-pushed the topic-iterator branch 3 times, most recently from 3845ed7 to b5e4217 Compare February 16, 2024 13:35
@franzpoeschel franzpoeschel marked this pull request as ready for review February 23, 2024 15:37
src/Series.cpp Fixed Show fixed Hide fixed
src/Series.cpp Outdated
Access at,
std::string const &options,
// Either an MPI_Comm or none, the template works for both options
MPI_Communicator &&...comm)

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable comm is not used.
test/CoreTest.cpp Fixed Show fixed Hide fixed
src/snapshots/ContainerImpls.cpp Fixed Show fixed Hide fixed
test/ParallelIOTest.cpp Fixed Show fixed Hide fixed
@franzpoeschel franzpoeschel force-pushed the topic-iterator branch 5 times, most recently from 667ee24 to f5d8433 Compare February 29, 2024 13:32
Comment on lines +88 to +117
// increment/decrement
// ChildClass &operator++();
// ChildClass &operator--();
// ChildClass &operator++(int);
// ChildClass &operator--(int);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +82 to +109
// dereference
// value_type const &operator*() const = 0;
// value_type &operator*() = 0;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
@@ -185,6 +196,7 @@
}
// ~Series intentionally not yet called

// std::cout << "READ " << filename << std::endl;

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.
@@ -163,6 +173,7 @@
auxiliary::getEnvNum("OPENPMD_TEST_NFILES_MAX", 1030);
std::string filename =
"../samples/many_iterations/many_iterations_%T." + ext;
// std::cout << "WRITE " << filename << std::endl;

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.
@franzpoeschel franzpoeschel force-pushed the topic-iterator branch 2 times, most recently from d278794 to 16c0f73 Compare December 9, 2024 16:17
namespace
{
template <typename... Args>
auto write_to_stdout([[maybe_unused]] Args &&...args) -> void

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function write_to_stdout is unreachable
@franzpoeschel franzpoeschel changed the title [WIP] Unify Random-Access API and Streaming API into Series::snapshots() Unify Random-Access API and Streaming API into Series::snapshots() Dec 9, 2024
@franzpoeschel
Copy link
Contributor Author

The changes introduced with this PR are mostly (1) internal and (2) a new API. I will go ahead and merge this now, as many of the internal changes are the groundwork for new features laid out in the PR description. The new API should be treated as experimental until release.

@franzpoeschel franzpoeschel enabled auto-merge (squash) December 10, 2024 14:25
@franzpoeschel franzpoeschel merged commit b8ce8a0 into openPMD:dev Dec 10, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants