Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3648 Change "DVN" to default value "producer" and only write the firs… #7094

Closed
wants to merge 73 commits into from
Closed

3648 Change "DVN" to default value "producer" and only write the firs… #7094

wants to merge 73 commits into from

Conversation

JingMa87
Copy link
Contributor

@JingMa87 JingMa87 commented Jul 16, 2020

…t geoBndBox element with four geo coordinates.

What this PR does / why we need it: Improves the DDI XML according to the XSD

Which issue(s) this PR closes: 3648 (doesn't close it, but it's a partial fix)

Closes nothing yet, it's an improvement

Special notes for your reviewer: It's an improvement in the DDI XML response, but more is needed in the future.

Suggestions on how to test this: Add multiple Geographic Bounding Boxes to your dataset, one with only 3 coordinates, and two with four coordinates. Only the first box with four coordinates should be in the DDI XML.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Yes

Additional documentation:

landreev and others added 30 commits May 29, 2020 10:50
…ate the

external "custom download" service. Everything else is done by an outside
standalone program (a java program with its own pom file). (#6505)
still working on the documentation, so will need to check it in later.
added some info to the documentation explaining how the zipper does its thing. (#6505)
(fixed merge conflicts w/develop - mostly the POST handling added for the /api/access/datafiles/ API)
…tom/service/download/ZipDownloadService.java

Co-authored-by: Philip Durbin <[email protected]>
@mheppler
Copy link
Contributor

@jggautier if you believe that issue should remain open, you can edit the PR description to remove the "Closes" keyword.

I also added a comment references this PR to other older DDI XML issues I found in the backlog.

kcondon added 3 commits July 20, 2020 14:58
Add "download all" buttons (including size of dataset) to dataset page
make ec2-create-instance.sh downloadable #7099
@JingMa87
Copy link
Contributor Author

@jggautier if you believe that issue should remain open, you can edit the PR description to remove the "Closes" keyword.

I also added a comment references this PR to other older DDI XML issues I found in the backlog.

I removed the Closes keyword and will take a look at the issues you referenced.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sending this to QA because I think there is incremental improvement in this pull request but here are a few additional thoughts:

  • This pull request might be better as two smaller ones. One would remove "DVN" from "source". The other would change the behavior of the bounding box for geospatial data. (There could be separate issues for these as well.)
  • It would be good to test the DDI before and after with a tool that does XML schema validation. I assume that there will be fewer failures in this pull request compared to develop. I haven't done this myself and I'm relying on what @JingMa87 has coded up and comments by @jggautier in terms of the geospatial changes being correct. It probably makes sense to someday put the XML schema validation into a test so we can continue to get it passing and keep it passing.
  • This pull request changes "source" from "DVN" to "producer" but it seems like "archive" is another option. I don't feel strongly about this but some DDI experts that are part of the Dataverse community might.

@@ -328,7 +328,7 @@ private static void writeDocDescElement (XMLStreamWriter xmlw, DatasetDTO datase

private static void writeVersionStatement(XMLStreamWriter xmlw, DatasetVersionDTO datasetVersionDTO) throws XMLStreamException{
xmlw.writeStartElement("verStmt");
writeAttribute(xmlw,"source","DVN");
writeAttribute(xmlw,"source","producer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string "DVN_3_0" appears in if (!SOURCE_DVN_3_0.equals(xmlr.getAttributeValue(null, "source"))) { in ImportDDIServiceBean. It's not an exact match so I don't think this will break dataset import from DDI but I thought I'd at least mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems like the "DVN_3_0" is a legacy value and will only still be used if an old Dataverse instance hasn't upgraded yet.

@djbrooke
Copy link
Contributor

@JingMa87 @jggautier before we move this to QA, do you have any thoughts about @pdurbin's comments?

@djbrooke djbrooke added this to the Dataverse 5 milestone Jul 21, 2020
@djbrooke djbrooke removed this from the Dataverse 5 milestone Jul 21, 2020
@pdurbin
Copy link
Member

pdurbin commented Jul 21, 2020

One last think to mention is that I brought up in tech hours today the idea of changing "source" from "DVN" to something more reasonable. No one objected and it sounds like it's a well known problem that "DVN" doesn't validate.

@jggautier
Copy link
Contributor

jggautier commented Jul 21, 2020

I agree about creating a new issue for the bounding box changes. Hopefully the <verStmt> change is easy to make. The bounding box change needs some more discussion I think. @JingMa87, I can create a new issue for the bounding box change if you'd like.

So this PR can be the PR for only the <verStmt> change, and changes related to the bounding boxes can be removed?

I would use "archive" instead of "producer," mainly because that's what's used for the "distrbtr source". But also in other places in the DDI, when source = "producer", that means the information is coming from the depositor(s) and when source = "archive", that means the information is coming from the repository. The metadata document version number is automatically generated by the repository, so maybe that's another argument for using "archive".

So right now the version statement for the metadata document looks like this:

<verStmt source="DVN">
    <version date="2020-07-21" type="RELEASED">1</version>
</verStmt>

"DVN" isn't a valid value for "source". So @JingMa87 proposed changing it to producer:

<verStmt source="producer">
    <version date="2020-07-21" type="RELEASED">1</version>
</verStmt>

Again, the distribution statement already uses "archive":

<distStmt>
    <distrbtr source="archive">QDR Main Collection</distrbtr>
    <distDate>2020-07-06</distDate>
</distStmt>

So I think the version statement should also use "archive":

<verStmt source="archive">
    <version date="2020-07-21" type="RELEASED">1</version>
</verStmt>

What's lost in this change is the "DVN". In the current DDI export, I'm not sure what's being expressed by <verStmt source="DVN">. Is it that the source of the version statement is the Dataverse software itself, as opposed to the repository? Is this a holdover from when there was just one "Dataverse Network"? I don't think this information is being used for anything and it's fine that it's lost in the change from this PR.

@JingMa87
Copy link
Contributor Author

JingMa87 commented Jul 22, 2020

JingMa87 added 3 commits July 22, 2020 06:11
…t geoBndBox element with four geo coordinates.
…ix-ddi-export

# Conflicts:
#	src/main/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtil.java
#	src/test/java/edu/harvard/iq/dataverse/export/ddi/dataset-finch1.xml
#	src/test/java/edu/harvard/iq/dataverse/export/ddi/exportfull.xml
@JingMa87
Copy link
Contributor Author

@pdurbin How come other people's commits are in this PR? Did I use a wrong command?

@jggautier
Copy link
Contributor

jggautier commented Jul 22, 2020

Thanks @JingMa87. About validating the XML with the codebook and pre-validation errors, when I test a codebook XML export of a dataset that has only the metadata required fields, the only other error I get (besides the "DVN" error) is about the use of <setAvail/> <useStmt/>. Is that what you meant by pre-validation errors?

@pdurbin
Copy link
Member

pdurbin commented Jul 22, 2020

@JingMa87 I don't know. Git is hard. 😄 I'm pretty quick to make new branches at any sign of trouble. And it looks like both @jggautier and I would prefer the "source" change to be made in a dedicated branch/pull request.

@JingMa87
Copy link
Contributor Author

JingMa87 commented Jul 24, 2020

@jggautier I actually meant that the imports that the XSD uses were faulty and prevented the validation from happening. But I fixed these in the meantime and all is well. Maybe I'll incorporate an XSD validation in the source code sometime in the future.
@pdurbin I made a separate issue for the DVN problem #7128 and will submit a PR soon. While validating the XML I came across another problem, namely the order of some of the elements. This is easy to fix and the issue can be found here #7127. I'll do it in the near future.

@JingMa87 JingMa87 closed this Jul 24, 2020
@JingMa87 JingMa87 deleted the 3648-fix-ddi-export branch July 24, 2020 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.