-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinates not set correctly (or at least not output correctly by OpenPMD) in CarpetX #152
Comments
I see that the mesh I also see that there are too many zero values. Maybe the ghost zones or the outer boundaries are all set to zero for some reason? |
Let me take a closer look. I had plotted via Steve's openPMD plotting script, gotten zeros and did a h5dump (and saw lots of zeros, but this may have been just inside ghosts or so). Indeed being more careful and doing a
I do see non-zero values as well. I'll update the description to "some" from "all". There should not be any large sections of 0.0 values even in what may be ghosts, should there? |
The zeros I see are in the outer boundary, and they should not be there. |
ok. so I guess I will try and check what is present in the simulation grid to make sure that this is not just an output issue. |
The zeros I see in the outer boundary are not present in TSV output. I assume the problem lies with the openPMD output. |
I see that levels 1 and 2 contain only zeros. Only level 0 is "fine", apart from its ghosts. Again this is specific to openPMD output. |
How strange. Ok, then I need to check what the Python code actually plots since it seems to plot non-zero values near the center only (where lev 1 and lev 2 would exist but not lev 0). |
I understand what is going on. openPMD specifies that, if the underlying storage format (here: HDF5) supports datasets with missing data (HDF5 does), then chunks should be implemented as if they covered the whole domain, and points not covered by chunks should be represented as missing data. Thus, in HDF5, every level seems to consisting of a single large dataset covering the whole domain. Points that are not defined are then filled in by HDF5 with its "missing value" placeholder, which is 0 by default. If you look carefully, then then centres of the refinement levels actually do contain data. I think there is an HDF5 function that lets you extract where data actually exists, but I'm not 100% sure. The easier way would be to parse the openPMD data structure that specifies where chunks lie, and to read only the respective hyperslabs. It's not a bug, it's a feature. The outer boundary is indeed not output by CarpetX. I should fix that. |
Here is some Julia code that shows this: julia> using HDF5
julia> f = h5open("/Users/eschnett/simulations/amrex3Lev/output-0000/amrex3Lev/amrex3Lev.it00000000.h5")
julia> extrema(f["data"]["0"]["meshes"]["coordinates_cell_coords_lev01"]["coordinates_ccoordx"][])
(-0.6171875, 0.6171875) |
The HDF5 mechanism that access parts of dataset is called "chunk", it's part of the dataset API. |
hmm. ok. and I guess h5dump also cannot distinguish between "no data, use default" and "0 written". Sigh. Sure this is a way to get non-rectangular data into HDF5's arrays. well at least one mystery solved. |
well yes, but how do I distinguish between the "fill value" being just a filler and it being an actual value? (in openpmd I would have to parse the grid structure to know what to ignore I guess). |
These are two slightly separate issues:
|
Does this not mean that one should write chunks as individual datasets in HDF5? There is an explicit provision for this in the standard with a naming convention for these datasets. For example, if the chunks are not the blocks that are written but have a fixed size, does this not mean that the HDF5 dataset would store (on disk) too much data, leading to a very inefficient representation? |
According to the standard, HDF5 would fall under that case, yes.
I am not certain about this (I am not really an HDF5 expert). I can imagine that HDF5 would write only those chunks that had an actual write access, which would still imply an overhead over the actually written regions, but be better than writing the whole domain. This should be easy to test out. However, for the current situation this is purely hypothetical anyway: Even if the memory representation is efficient in HDF5, it does not give us back any information on regions that we actually wrote. This is why I suggested in my comment on the standard PR that an explicit encoding of this information might be added in openPMD. |
I can provide a bit of input maybe on HDF5 chunks. As far as I understand them, a HDF5 chunk is a (say) 3D subset of the grid and when one uses chuncked output then if one say passes HDF5 a 100x100x100 block in H5Dwrite but the chunk size is 13x13x13 then HDF5 will write things to disk such that the 13x13x13 data is contiguous on disk. That way one can more efficiently read back 3D hyperslabs afterwards (basically it "tlies"). Cunking is mandatory for compressed files. And chuncks that are never written to are just not present. In regions of a chunk that is not covered by the 100x100x100 set HDF5 puts in some value (the fill value). I would have to double check the implementation details I guess but I do not think that HDF5 actually keeps a record of which data points in a chunk are "real" data and which ones are "fill data" (ie it will actually write 0 to disk to fill up the chunk). Thanks for the link to the PR. I have not looked at the PR myself yest. From what you say, @franzpoeschel does this mean that there will be "favored" backends for openPMD (and "disfavored" ones that cannot handle all data that openPMD's standard describes)? Clearly some are more useful in some situations than others (eg. I would not want to try the json backend on TB sized datasets). |
Thank you for the details!
The general answer to this is that we try to be independent of backends and create workflows where fast substitution of one backend for another one is possible. Given your explanation that HDF5 fundamentally has the capability of storing sparse datasets, this should mean that the feature can be extended to HDF5 with some work, as I suggested in openPMD/openPMD-standard#252 (comment) earlier today. However it means that someone will have to put in the work at some point ;) Mesh refinement has been mainly driven by @ax3l, so I'll ping him for his perspective on this. |
I'm closing this because this needs to be resolved in openPMD's HDF5 backend. |
The attached parfile, when run with CarpetX git hash f05b7a4, using 1 MPI rank, produces an output file amrex3Lev.it00000000.h5 where some coordinates for lev0 and lev1 are incorrectly stored as 0. Since HDF5 is used as the backend this can be easily verified using h5dump. GitHub forces gz, sorry.
amrex3Lev.par.gz
amrex3Lev.it00000000.h5.gz
The text was updated successfully, but these errors were encountered: