Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: 109 layer descriptor shape parameters #110

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

brian-r-calder
Copy link
Collaborator

Issue #109: layers do not currently have separate shape parameters (except Geometadata layer), but should in order for VR refinements to work appropriately. This pull request adds this functionality, and also bug-fixes a number of issues in the VR implementation, and other deprecation issues (e.g., snprintf() rather than sprintf()).

In order to allow for layers that have different dimensions from the main grid (specifically the VRRefinements and VRNode layers), added support for dimensions in the LayerDescriptor class, and propagated support to the derived classes.
VR data in field BAG files appears to contain 2D arrays rather than the 1D version that they should; modified code to follow this model, rather than the standard so that source data can be read.  Replaced deprecated sprintf() with snprintf().
WIP debugging to confirm location of memory leak causing heap-bombs elsewhere (commit purely to allow publication and other support activities).
@brian-r-calder brian-r-calder added bug Something isn't working enhancement New feature or request labels Jun 25, 2024
hsize_t dims[2];
int ndims = h5dataSet->getSpace().getSimpleExtentDims(dims, nullptr);
if (ndims != 2) {
throw InvalidVRRefinementDimensions{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder Here we are checking for there to be two dimensions, if there aren't, an InvalidVRRefinementDimensions exception is thrown. However, the message from InvalidVRRefinementDimensions currently reads: "The variable resolution refinement layer is not 1 dimensional.", which is confusing. Can you clarify?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder Relatedly, test_bag_valuetable.cpp is failing in test case TEST_CASE("test georeferenced metadata layer readVR/writeVR", "[georefMetadatalayer][readvr][writevr]") because the test VR dataset is being used is one-dimensional, which is triggering the above exception. Does the test need to be updated, or the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case of adapting to the reality of BAG files in the field. Although the specification says that VR refinements is a one-dimensional array, and should be written into the HDF5 file as a single dimensional dataset, BAG files retrieved from NCEI (from NOAA surveys) do not have this form. Although not entirely proven, I suspect that this is because the GDAL drivers are not following the specification as written.

We have a couple of options here: we can attempt to get the GDAL drivers re-written and then re-write ever BAG file in the US national archive (and presumably everywhere else that BAG files are created using GDAL -- meaning everything that uses CARIS software, most likely); or, we can adapt the reference library to match the field experience. Given the level of complexity in the former, I've chosen to pursue the latter.

To fix this, I've changed the exception to say that the dimensions are "inconsistent with specification", but this also means that we'll have to change the test to reflect this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder The check for ndims != 2 causes exception InvalidVRRefinementDimensions when our tests try to read back VR data previously written by the test, see here, which is triggered here. In this case, because the BAG library created the VRRefinements layer, ndim=1.

I think we need to be able to handle either ndims=1 or ndims=2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might be less useful than making the code write ndims=2 rather than ndims=1 --- the goal with BAG has always been to attempt to have as few options like this as possible so as to keep the code simple. Since we're rather Overtaken By Events here (the archive at NCEI already containing BAGs that have ndims=2 due to incautious vendors not checking the specifications) I'd advocate for switching everything to ndims=2, including the specification.

uint32_t numRows = 0, numColumns = 0;
std::tie(numRows, numColumns) = pDataset->getDescriptor().getDims();
std::tie(numRows, numColumns) = m_pLayerDescriptor->getDims();

if (columnEnd >= numColumns || rowEnd >= numRows)
Copy link
Collaborator

@selimnairb selimnairb Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder This branch is being taken when running test_bag_vrmetadata.cpp, specifically when the test attempts to read back the VR metadata. Is the test wrong, or is there a mistake in Layer::read()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a branch here, but the code is correct --- we have to read the dimensions specifically from the layer rather than from the dataset, since we need to allow each layer to have its own dimensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was if (columnEnd >= numColumns || rowEnd >= numRows) evaluates to true when Layer::read() is called from test_bag_vrmetadata.cpp:452. Is the VR layer being created by the test incorrect?

selimnairb and others added 9 commits December 6, 2024 10:58
When reporting exceptions for VR refinements not having the right number of dimensions, we have both 1D and 2D arrays that have to be handled (the 2D simply because we're patching around a bug in BAG creation in other software that doesn't use this code that's not following the specification correctly).  The reporting of InvalidVRRefinementDimensions therefore has to be neutral on number of dimensions and report that the number is inconsistent.
… in addition to XML file on filesystem, to provide diagnostic info when tests crash
… common usage that is counter to the BAG spec (i.e. 1D)
…to read-back dta after it was written to disk in accordance with other tests (such as 'test vr metadata create open' in the same test file)
@@ -30,6 +30,7 @@ LayerDescriptor::LayerDescriptor(
std::string internalPath,
std::string name,
LayerType type,
uint32_t rows, uint32_t cols,
Copy link
Collaborator

@selimnairb selimnairb Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder We are defining rows and cols as uint32_t but these are always set from hsize_t values. hsize_t is a typedef of uint64_t, so an implicit narrowing is taking place. This doesn't seem to be a problem for GCC on Linux, but it fails by default in Visual Studio, and is a potential source of error.

Any objections to making these uint64_t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that should be fine with most modern implementations. It was historically uint32_t because not all systems had moved to first-class uint64_t support at the time, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll make this change.

…than uint32_t like elsewhere in the API) because dimensions are initialized from HDF5 hsize_t values, which are uint64_t.
…ying uint64_t to maintain compatibility with the rest of the BAG API, which assumes rows and cols are uint32_t.
…s other implementations do) rather than 1D (as called for in BAG spec)
@@ -354,7 +367,14 @@ void VRNode::writeProxy(
throw DatasetNotFound{};

auto pDataset = this->getDataset().lock();
// TODO: Confirm that this is what we want --- this resets the dimensions of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder The code in this branch needs to be updated to account for there now being two dimensions in the file data space (see here for example). However, having this modify the main BAG dataset dimensions seems wrong. I'm not sure what the right answer is though.

@@ -321,7 +336,14 @@ void VRRefinements::writeProxy(
throw DatasetNotFound{};

auto pDataset = this->getDataset().lock();
// TODO: Confirm that this is what we want --- this resets the dimensions of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder The code in this branch needs to be updated to account for there now being two dimensions in the file data space (see here for example). However, having this modify the main BAG dataset dimensions seems wrong. I'm not sure what the right answer is though.

@@ -55,7 +58,7 @@ std::shared_ptr<VRNodeDescriptor> VRNodeDescriptor::create(
int compressionLevel)
{
return std::shared_ptr<VRNodeDescriptor>(
new VRNodeDescriptor{dataset.getNextId(), chunkSize,
new VRNodeDescriptor{dataset.getNextId(), 1, 1, chunkSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder I changed this to 1 row and 1 column to get the existing tests to pass, but it seems like this should start out as 0, 0 and only be changed if a node is added. What do you think?

@@ -54,7 +56,7 @@ std::shared_ptr<VRRefinementsDescriptor> VRRefinementsDescriptor::create(
int compressionLevel)
{
return std::shared_ptr<VRRefinementsDescriptor>(
new VRRefinementsDescriptor{dataset.getNextId(), chunkSize,
new VRRefinementsDescriptor{dataset.getNextId(), 1, 1, chunkSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-r-calder I changed this to 1 row and 1 column to get the existing tests to pass, but it seems like this should start out as 0, 0 and only be changed if a refinement is added. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants