Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the Geographic Bounding Box allow optional coordinates and multiple boxes in the XML? #7108

Closed
JingMa87 opened this issue Jul 22, 2020 · 12 comments

Comments

@JingMa87
Copy link
Contributor

JingMa87 commented Jul 22, 2020

When the issue occurs: When harvesting a dataset with Geospatial Metadata
Which page it occurs on: It occurs when harvesting, so the Dashboard
To whom it occurs: People who harvest Geospatial Metadata
Version: 4.20

Context
In issue #3648 we addressed the problem that Dataverse doesn't always produce valid DDI. One of the issues is that DDI doesn't allow optional coordinates and is unclear about multiple geographic bounding boxes. This problem can be fixed by filtering out DDI-invalid entries when making the XML. There's also an issue open for similar changes in the UI: #7091

All four points are mandatory
DDI requires a bounding box element to have exactly one occurrence of every point. The default minOccurs and maxOccurs is 1 when not specified.

image

Unlimited boxes?
On the one hand, the geoBndBox element can logically only occur once according to the DDI textual description since it's the biggest box where all locations fall in. On the other hand, the technical spec says that the geoBndBox element can occur multiple times as part of the sumDscr element. Does it logically make sense to add unlimited amounts then? Do researchers define multiple boxes? We will conclude this in the Geospatial Discovery Working Group.

image

@JingMa87
Copy link
Contributor Author

JingMa87 commented Jul 29, 2020

@pdurbin @jggautier I've had email contact with the DDI Alliance. Wendy Thomas, the chair of the technical committee, responded that the Codebook only supports one Geographic Bounding Box. It's true that the Codebook allows multiple boxes that are each nested in a sumDscr element, but the DDI Alliance are basically in the process of reviewing the DDI specs to address problems like this. It has also been added to their issue tracker so their working group can address it. I'm also going to join the group to add a Dataverse perspective. To me it seems that there's green light to make a PR for only adding one geoBndBox element to the XML, do you guys agree? Otherwise we'd have to wait until at least September for discussing this in the Geospatial Discovery Working Group. Also the problem of multiple boxes that are allowed in the UI might be concluded now #7091.

@jggautier
Copy link
Contributor

jggautier commented Jul 29, 2020

I agree, especially about not recording in the DDI XML export when there are fewer than four fields filled.

To help answer "Do researchers define multiple boxes?", in the 35 Dataverse installations whose metadata I've collected so far, when people haven't filled out all four bounding box fields it's almost always because they used two of the fields for lat long points instead (there are other GitHub issues about adding fields for recording other types of geographic metadata, like lat long points, and fixing this metadata in Harvard Dataverse).

About the multiple boxes, in those 35 installations there are eight datasets whose latest versions have multiple sets of bounding box fields where each set has all four fields filled. In at least one case, this was done because the data of two studies was put into one dataset. I've heard from researchers that putting the data of multiple studies into one dataset is just more convenient than creating two datasets, especially if both studies are associated with one journal article. I don't know if that's the case for all eight datasets. Maybe something the geospatial and/or metadata working groups can look into.

So would the PR you're thinking of change the DDI XML export so that only the first set of bounding boxes is included in the XML and only if all four fields are filled? (Surprisingly, the DDI Codebook xsd allows anything to be entered in those fields.)

I think it should be noted that this will also affect the DDI HTML Codebook export. (The PR for that HTML export at #6081 makes me think that the HTML is generated from the DDI XML. One HTML codebook with multiple complete bounding boxes is at https://dataverse.harvard.edu/api/datasets/export?exporter=html&persistentId=doi:10.7910/DVN/6R8F7U)

If this is something that needs to be fixed before the working group can consider it, I think it's fine to make this change, especially since it seems easy to adjust later assuming that it's easy for installations to recreate or reexport the DDI Codebook XML and DDI Codebook HTML of existing datasets.

@JingMa87
Copy link
Contributor Author

I've heard from researchers that putting the data of multiple studies into one dataset is just more convenient than creating two datasets, especially if both studies are associated with one journal article.

I can definitely understand that, but then there should be one bounding box which encompasses both studies geographically.

So would the PR you're thinking of change the DDI XML export so that only the first set of bounding boxes is included in the XML and only if all four fields are filled?

Yes.

I think it should be noted that this will also affect the DDI HTML Codebook export.

I'll check if I need to fix anything for that.

@pdurbin
Copy link
Member

pdurbin commented Jul 30, 2020

the DDI Alliance are basically in the process of reviewing the DDI specs to address problems like this. It has also been added to their issue tracker so their working group can address it.

Great! If their issue tracker is public, please link to the issue here.

I'm also going to join the group to add a Dataverse perspective.

Fantastic! Thank you! 🎉 I'm hoping you'll find some other Dataverse people there, such as @stevenmce

To me it seems that there's green light to make a PR

I'm torn between saying, "Sure, go ahead" and "Maybe wait until the GDCC geopatial working group has met." The problem is, I have no idea when they will meet. So I guess you should go ahead if you feel like it. @jggautier seems to be on board. 😄

@stevenmce
Copy link

Hi all,
I'm involved in the DDI codebook review, as is @jggautier and others. Would be good to have the DataverseNL/CESSDA perspective in there.
Cheers,
Steve

@JingMa87
Copy link
Contributor Author

JingMa87 commented Jul 31, 2020

Issue at DDI Alliance: https://ddi-alliance.atlassian.net/browse/DDICODE-70

@JingMa87
Copy link
Contributor Author

JingMa87 commented Aug 3, 2020

@pdurbin Actually, it's better to wait for this fix until #7135 is merged, because the code base overlaps.

@jggautier
Copy link
Contributor

jggautier commented Aug 25, 2020

Just updating this issue: In the ticket at https://ddi-alliance.atlassian.net/browse/DDICODE-70 Wendy reiterated that only one bounding box should be included per dataset and suggested that the DDI documentation about the field could be edited "to explain the point of having a single Bounding Box."

Wendy also noted that bound polygons might be the best option when depositors need to define individual areas.

Maybe the geospatial and/or metadata working groups can consider editing the tooltip with the Bounding Box explanation that makes its way into the DDI Codebook documentation, explore why depositors added multiple bounding boxes, and consider adding a repeatable bound polygon field.

If Dataverse's geospatial metadatablock is changed so that the Geographic Bounding Box field is non-repeatable, what will happen if someone tries to edit a dataset that currently has multiple Geographic Bounding Boxes? (I think the same question applies for several metadata model changes that need to be made in existing Dataverse-based repositories. I'm trying to test this scenario but it might take some time. Still learning how to change Dataverse's configurations.)

@pdurbin
Copy link
Member

pdurbin commented Oct 12, 2022

@JingMa87 is no longer working on the issues he opened: #7412 (comment)

We should check with the DANS crew to see if any of them are still interested in this: @PaulBoon @janvanmansum @WittenbergM @LauraHuisintveld

It looks like @stevenmce commented above. Hi!

Issue at DDI Alliance: https://ddi-alliance.atlassian.net/browse/DDICODE-70

There's some robust discussion in the issue above as well.

@janvanmansum
Copy link
Contributor

@pdurbin : sorry for the delay. This issue is not a priority for DANS.

@mreekie mreekie removed the NIH OTA: 1.2.2 2 | 1.2.2 | Define use cases for DDI-CDI support | 5 prdOwnThis is an item synched from the produ... label Nov 8, 2022
@mreekie mreekie removed the epic.8701 label Mar 10, 2023
@pdurbin pdurbin added Type: Suggestion an idea User Role: API User Makes use of APIs labels Oct 9, 2023
@jggautier
Copy link
Contributor

There's some related work being documented at #9547

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants