-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should the Geographic Bounding Box allow optional coordinates and multiple boxes in the XML? #7108
Comments
@pdurbin @jggautier I've had email contact with the DDI Alliance. Wendy Thomas, the chair of the technical committee, responded that the Codebook only supports one Geographic Bounding Box. It's true that the Codebook allows multiple boxes that are each nested in a sumDscr element, but the DDI Alliance are basically in the process of reviewing the DDI specs to address problems like this. It has also been added to their issue tracker so their working group can address it. I'm also going to join the group to add a Dataverse perspective. To me it seems that there's green light to make a PR for only adding one geoBndBox element to the XML, do you guys agree? Otherwise we'd have to wait until at least September for discussing this in the Geospatial Discovery Working Group. Also the problem of multiple boxes that are allowed in the UI might be concluded now #7091. |
I agree, especially about not recording in the DDI XML export when there are fewer than four fields filled. To help answer "Do researchers define multiple boxes?", in the 35 Dataverse installations whose metadata I've collected so far, when people haven't filled out all four bounding box fields it's almost always because they used two of the fields for lat long points instead (there are other GitHub issues about adding fields for recording other types of geographic metadata, like lat long points, and fixing this metadata in Harvard Dataverse). About the multiple boxes, in those 35 installations there are eight datasets whose latest versions have multiple sets of bounding box fields where each set has all four fields filled. In at least one case, this was done because the data of two studies was put into one dataset. I've heard from researchers that putting the data of multiple studies into one dataset is just more convenient than creating two datasets, especially if both studies are associated with one journal article. I don't know if that's the case for all eight datasets. Maybe something the geospatial and/or metadata working groups can look into. So would the PR you're thinking of change the DDI XML export so that only the first set of bounding boxes is included in the XML and only if all four fields are filled? (Surprisingly, the DDI Codebook xsd allows anything to be entered in those fields.) I think it should be noted that this will also affect the DDI HTML Codebook export. (The PR for that HTML export at #6081 makes me think that the HTML is generated from the DDI XML. One HTML codebook with multiple complete bounding boxes is at https://dataverse.harvard.edu/api/datasets/export?exporter=html&persistentId=doi:10.7910/DVN/6R8F7U) If this is something that needs to be fixed before the working group can consider it, I think it's fine to make this change, especially since it seems easy to adjust later assuming that it's easy for installations to recreate or reexport the DDI Codebook XML and DDI Codebook HTML of existing datasets. |
I can definitely understand that, but then there should be one bounding box which encompasses both studies geographically.
Yes.
I'll check if I need to fix anything for that. |
Great! If their issue tracker is public, please link to the issue here.
Fantastic! Thank you! 🎉 I'm hoping you'll find some other Dataverse people there, such as @stevenmce
I'm torn between saying, "Sure, go ahead" and "Maybe wait until the GDCC geopatial working group has met." The problem is, I have no idea when they will meet. So I guess you should go ahead if you feel like it. @jggautier seems to be on board. 😄 |
Hi all, |
Issue at DDI Alliance: https://ddi-alliance.atlassian.net/browse/DDICODE-70 |
Just updating this issue: In the ticket at https://ddi-alliance.atlassian.net/browse/DDICODE-70 Wendy reiterated that only one bounding box should be included per dataset and suggested that the DDI documentation about the field could be edited "to explain the point of having a single Bounding Box." Wendy also noted that bound polygons might be the best option when depositors need to define individual areas. Maybe the geospatial and/or metadata working groups can consider editing the tooltip with the Bounding Box explanation that makes its way into the DDI Codebook documentation, explore why depositors added multiple bounding boxes, and consider adding a repeatable bound polygon field. If Dataverse's geospatial metadatablock is changed so that the Geographic Bounding Box field is non-repeatable, what will happen if someone tries to edit a dataset that currently has multiple Geographic Bounding Boxes? (I think the same question applies for several metadata model changes that need to be made in existing Dataverse-based repositories. I'm trying to test this scenario but it might take some time. Still learning how to change Dataverse's configurations.) |
@JingMa87 is no longer working on the issues he opened: #7412 (comment) We should check with the DANS crew to see if any of them are still interested in this: @PaulBoon @janvanmansum @WittenbergM @LauraHuisintveld It looks like @stevenmce commented above. Hi!
There's some robust discussion in the issue above as well. |
@pdurbin : sorry for the delay. This issue is not a priority for DANS. |
There's some related work being documented at #9547 |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
When the issue occurs: When harvesting a dataset with Geospatial Metadata
Which page it occurs on: It occurs when harvesting, so the Dashboard
To whom it occurs: People who harvest Geospatial Metadata
Version: 4.20
Context
In issue #3648 we addressed the problem that Dataverse doesn't always produce valid DDI. One of the issues is that DDI doesn't allow optional coordinates and is unclear about multiple geographic bounding boxes. This problem can be fixed by filtering out DDI-invalid entries when making the XML. There's also an issue open for similar changes in the UI: #7091
All four points are mandatory
DDI requires a bounding box element to have exactly one occurrence of every point. The default minOccurs and maxOccurs is 1 when not specified.
Unlimited boxes?
On the one hand, the geoBndBox element can logically only occur once according to the DDI textual description since it's the biggest box where all locations fall in. On the other hand, the technical spec says that the geoBndBox element can occur multiple times as part of the sumDscr element. Does it logically make sense to add unlimited amounts then? Do researchers define multiple boxes? We will conclude this in the Geospatial Discovery Working Group.
The text was updated successfully, but these errors were encountered: