Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinates not set correctly (or at least not output correctly by OpenPMD) in CarpetX #152

Closed
rhaas80 opened this issue Jun 27, 2023 · 19 comments

Comments

@rhaas80
Copy link
Member

rhaas80 commented Jun 27, 2023

The attached parfile, when run with CarpetX git hash f05b7a4, using 1 MPI rank, produces an output file amrex3Lev.it00000000.h5 where some coordinates for lev0 and lev1 are incorrectly stored as 0. Since HDF5 is used as the backend this can be easily verified using h5dump. GitHub forces gz, sorry.

amrex3Lev.par.gz
amrex3Lev.it00000000.h5.gz

@eschnett
Copy link
Collaborator

I see that the mesh coordinates_cell_coords_lev00 variable coordinates_ccoordx contains many nonzero values.

I also see that there are too many zero values. Maybe the ghost zones or the outer boundaries are all set to zero for some reason?

@rhaas80
Copy link
Member Author

rhaas80 commented Jun 27, 2023

Let me take a closer look. I had plotted via Steve's openPMD plotting script, gotten zeros and did a h5dump (and saw lots of zeros, but this may have been just inside ghosts or so).

vcoordx3Lev00001

Indeed being more careful and doing a

h5dump -d /data/0/meshes/coordinates_vertex_coords_lev00/coordinates_vcoordx amrex.it00000000.h5

I do see non-zero values as well.

I'll update the description to "some" from "all". There should not be any large sections of 0.0 values even in what may be ghosts, should there?

@eschnett
Copy link
Collaborator

The zeros I see are in the outer boundary, and they should not be there.

@rhaas80
Copy link
Member Author

rhaas80 commented Jun 27, 2023

ok. so I guess I will try and check what is present in the simulation grid to make sure that this is not just an output issue.

@eschnett
Copy link
Collaborator

The zeros I see in the outer boundary are not present in TSV output. I assume the problem lies with the openPMD output.

@eschnett
Copy link
Collaborator

I see that levels 1 and 2 contain only zeros. Only level 0 is "fine", apart from its ghosts. Again this is specific to openPMD output.

@rhaas80 rhaas80 changed the title Coordinates not set correctly (or at least not output corrected by OpenPMD) in CarpetX Coordinates not set correctly (or at least not output correctly by OpenPMD) in CarpetX Jun 27, 2023
@rhaas80
Copy link
Member Author

rhaas80 commented Jun 27, 2023

How strange. Ok, then I need to check what the Python code actually plots since it seems to plot non-zero values near the center only (where lev 1 and lev 2 would exist but not lev 0).

@eschnett
Copy link
Collaborator

I understand what is going on.

openPMD specifies that, if the underlying storage format (here: HDF5) supports datasets with missing data (HDF5 does), then chunks should be implemented as if they covered the whole domain, and points not covered by chunks should be represented as missing data.

Thus, in HDF5, every level seems to consisting of a single large dataset covering the whole domain. Points that are not defined are then filled in by HDF5 with its "missing value" placeholder, which is 0 by default. If you look carefully, then then centres of the refinement levels actually do contain data.

I think there is an HDF5 function that lets you extract where data actually exists, but I'm not 100% sure. The easier way would be to parse the openPMD data structure that specifies where chunks lie, and to read only the respective hyperslabs.

It's not a bug, it's a feature.

The outer boundary is indeed not output by CarpetX. I should fix that.

@eschnett
Copy link
Collaborator

Here is some Julia code that shows this:

julia> using HDF5
julia> f = h5open("/Users/eschnett/simulations/amrex3Lev/output-0000/amrex3Lev/amrex3Lev.it00000000.h5")
julia> extrema(f["data"]["0"]["meshes"]["coordinates_cell_coords_lev01"]["coordinates_ccoordx"][])
(-0.6171875, 0.6171875)

@eschnett
Copy link
Collaborator

The HDF5 mechanism that access parts of dataset is called "chunk", it's part of the dataset API.

@rhaas80
Copy link
Member Author

rhaas80 commented Jun 27, 2023

hmm. ok. and I guess h5dump also cannot distinguish between "no data, use default" and "0 written". Sigh. Sure this is a way to get non-rectangular data into HDF5's arrays. 

well at least one mystery solved.

@eschnett
Copy link
Collaborator

H5Pset_fill_value()

@rhaas80
Copy link
Member Author

rhaas80 commented Jun 27, 2023

well yes, but how do I distinguish between the "fill value" being just a filler and it being an actual value? (in openpmd I would have to parse the grid structure to know what to ignore I guess).

@franzpoeschel
Copy link

These are two slightly separate issues:

  1. Filling the dataset up with zeros is something that HDF5 does on its own. I'm not entirely sure how far the support for sparse datasets extends in HDF5, especially if chunking might help saving disk space if not writing the entire domain. (A chunk in HDF5 does not correspond with a written block, but is rather the regular extent of arrays that the domain is covered by in the internal representation).
    I don't think that setting a fill value with H5Pset_fill_value() is desired here as this would write the entire domain (I believe?) and this is not desired in mesh refinement. In our implementation, we specify H5Pset_fill_time(datasetCreationProperty, H5D_FILL_TIME_NEVER) which corresponds with what is needed in mesh refinement, but apparently makes it look like everything is zero.
    These two open questions are something that we should check.
  2. Basically, the current status of the openPMD-standard PR for mesh refinement does not really consider HDF5, see item 2 of my last comment here. HDF5 has no way of recovering the written regions and the PR relies on that backend capability.

@eschnett
Copy link
Collaborator

Does this not mean that one should write chunks as individual datasets in HDF5? There is an explicit provision for this in the standard with a naming convention for these datasets.

For example, if the chunks are not the blocks that are written but have a fixed size, does this not mean that the HDF5 dataset would store (on disk) too much data, leading to a very inefficient representation?

@franzpoeschel
Copy link

Does this not mean that one should write chunks as individual datasets in HDF5? There is an explicit provision for this in the standard with a naming convention for these datasets.

According to the standard, HDF5 would fall under that case, yes.

For example, if the chunks are not the blocks that are written but have a fixed size, does this not mean that the HDF5 dataset would store (on disk) too much data, leading to a very inefficient representation?

I am not certain about this (I am not really an HDF5 expert). I can imagine that HDF5 would write only those chunks that had an actual write access, which would still imply an overhead over the actually written regions, but be better than writing the whole domain. This should be easy to test out.

However, for the current situation this is purely hypothetical anyway: Even if the memory representation is efficient in HDF5, it does not give us back any information on regions that we actually wrote. This is why I suggested in my comment on the standard PR that an explicit encoding of this information might be added in openPMD.

@rhaas80
Copy link
Member Author

rhaas80 commented Jun 28, 2023

I can provide a bit of input maybe on HDF5 chunks. As far as I understand them, a HDF5 chunk is a (say) 3D subset of the grid and when one uses chuncked output then if one say passes HDF5 a 100x100x100 block in H5Dwrite but the chunk size is 13x13x13 then HDF5 will write things to disk such that the 13x13x13 data is contiguous on disk. That way one can more efficiently read back 3D hyperslabs afterwards (basically it "tlies").

Cunking is mandatory for compressed files. And chuncks that are never written to are just not present. In regions of a chunk that is not covered by the 100x100x100 set HDF5 puts in some value (the fill value). I would have to double check the implementation details I guess but I do not think that HDF5 actually keeps a record of which data points in a chunk are "real" data and which ones are "fill data" (ie it will actually write 0 to disk to fill up the chunk).

Thanks for the link to the PR.

I have not looked at the PR myself yest. From what you say, @franzpoeschel does this mean that there will be "favored" backends for openPMD (and "disfavored" ones that cannot handle all data that openPMD's standard describes)? Clearly some are more useful in some situations than others (eg. I would not want to try the json backend on TB sized datasets).

@franzpoeschel
Copy link

Thank you for the details!

I have not looked at the PR myself yest. From what you say, @franzpoeschel does this mean that there will be "favored" backends for openPMD (and "disfavored" ones that cannot handle all data that openPMD's standard describes)? Clearly some are more useful in some situations than others (eg. I would not want to try the json backend on TB sized datasets).

The general answer to this is that we try to be independent of backends and create workflows where fast substitution of one backend for another one is possible.
This rule is sometimes broken for (1) backend-specific features (however, we try to implement fallback solutions) and (2) for prototyping of experimental/new APIs that are easier to implement in one backend than the other. The mesh refinement draft is an instance of case 2, where the capability of ADIOS2 to natively report the written regions just makes this easier to prototype in that backend. I don't think that this implies a hierarchy of favored vs. disfavored backends. (In fact, there are existing openPMD-based workflows that are easier to model in HDF5).

Given your explanation that HDF5 fundamentally has the capability of storing sparse datasets, this should mean that the feature can be extended to HDF5 with some work, as I suggested in openPMD/openPMD-standard#252 (comment) earlier today. However it means that someone will have to put in the work at some point ;)

Mesh refinement has been mainly driven by @ax3l, so I'll ping him for his perspective on this.

@eschnett
Copy link
Collaborator

eschnett commented Jul 5, 2023

I'm closing this because this needs to be resolved in openPMD's HDF5 backend.

@eschnett eschnett closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants