Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export dataset metadata as DDI #2579

Closed
pdurbin opened this issue Sep 23, 2015 · 8 comments
Closed

Export dataset metadata as DDI #2579

pdurbin opened this issue Sep 23, 2015 · 8 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Sep 23, 2015

Exporting dataset metadata as DDI is a subtask of the main export issue (#907) which lists other non-DDI formats as well.

See also DRAFT - Dataset Metadata Export Functional Requirements Document (FRD).

@pdurbin pdurbin self-assigned this Sep 23, 2015
@pdurbin pdurbin added this to the 4.4 milestone Sep 23, 2015
pdurbin added a commit that referenced this issue Sep 30, 2015
Also add "citation" to DatasetVersionDTO since we need it to export DDI.
@mercecrosas mercecrosas modified the milestones: 4.4, In Review Nov 30, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Jan 5, 2016

@sekmiller I'll go through this in person, but here's a summary of where this issue stands.

The goal generally is for the DDI output to be complete, accurate, and valid against an XML schema. You might find http://www.ddialliance.org/resources/markup-examples helpful.

I would advise starting by opening DdiExportUtilTest.java in Netbeans and clicking "Test File" (under "Run"). This should should show "Tests run: 2, Failures: 0" and some (incomplete) DDI output for two datasets, one that has files and one that doesn't. My approach has been to have these tests not rely on a running database. They're unit tests in the spirit of the doc by @michbarsinai at 9f91b93 and #2746.

Next I would play around with the integration tests of the API I've written using REST Assured. Open DatasetsIT.java and do "Test File". As before, you should see "Tests run: 2, Failures: 0". One of the two tests
is called "testGetDdi" and it's testing that the API endpoint to export datasets as DDI is disabled. We'll leave it disabled until we're ready to ship DDI support but I would recommend enabling it with curl -X PUT -d true http://localhost:8080/api/admin/settings/:DdiExportEnabled so you can play with it and make that "testGetDdi" integration test real. I can help you get these tests running and will try to cover some of this in my talk on REST Assured next week. :)

The next integration test to try is in BatchIT.java. If you haven't already, run the curl command above to enable DDI support. Open the file and do "Run Test". The "ensureDdiExportIsSuperuserOnlyForNow" test will fail if DDI support is enabled but that's ok. The important thing to focus on is the DDI being exported by the "roundTripDdi" test. The idea of this test is that a bonafide DDI file is exported into Dataverse, persisted to the database, and then exported as DDI, hence the round trip. The idea with this test is that by the time the DDI export feature is complete, the input file and the output DDI should match.

I hope this helps you get started! You'll find TODO's scattered in the code for stuff that isn't done. And I'm sure there's more stuff in my head that I will try to pass along.

Also, here are some questions I have about this issue that are top of mind:

  • DDI versions: Currently, the code exports DDI version 2.0 because it's based on code from DVN 3.x. The FRD says that DDI version 2.5 should be supported. Does this mean that only DDI 2.5 should be supported? Please note that while DDI 2.0 is currently being validated against an XML schema, I've had trouble getting this to work with DDI 2.5.
  • For DDI export via API, should the same access rules apply as for JSON export? Right now you don't need an API token to download JSON of a published dataset.
  • Is the FRD complete? (I suspect that it isn't.) Please follow up with @scolapasta et al. about requirements.

@pdurbin pdurbin assigned sekmiller and unassigned pdurbin Jan 5, 2016
@posixeleni
Copy link
Contributor

Regarding DDI 2.5 vs 2.0:

  1. There is documentation on how to update to the newer version of the
    codebook:
    http://www.ddialliance.org/Specification/DDI-Codebook/2.5/Updating_existing_DDI_Codebook_Instances_to_2-5.pdf
  2. With regards to supporting 2.0 or not: The DDI-Codebook development line
    is backward compatible meaning that instances compliant with DDI versions 1
    – 2.1 will also be compliant with version 2.5. So although our export is
    not 2.0 it can still be supported with 2.5.

Also when we talk about JSON export are we talking about the native
dataverse metadata will be exported in JSON or will we have JSON formats
for DDI and Dublin Core?

On Tue, Jan 5, 2016 at 1:49 PM, Philip Durbin [email protected]
wrote:

@sekmiller
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=auHKbD_JIxrq4O-GxKSCFrlz3ytki5zKxYXljlJJOzE&e=
I'll go through this in person, but here's a summary of where this issue
stands.

The goal generally is for the DDI output to be complete, accurate, and
valid against an XML schema. You might find
http://www.ddialliance.org/resources/markup-examples
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ddialliance.org_resources_markup-2Dexamples&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=2gZaaH1l_sllQxblCY6SDCoLQxxlwgzKttuzcujjtxI&e=
helpful.

I would advise starting by opening DdiExportUtilTest.java in Netbeans and
clicking "Test File" (under "Run"). This should should show "Tests run: 2,
Failures: 0" and some (incomplete) DDI output for two datasets, one that
has files and one that doesn't. My approach has been to have these tests
not rely on a running database. They're unit tests in the spirit of the
doc by @michbarsinai
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_michbarsinai&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=sSmbuwTwP8y5E1sX2Fde5c9kRtJ70FWLuPsmRrlfRsk&e=
at 9f91b93
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_commit_9f91b93e7ce9dda05dd7e1ab84f491a332f789e4&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=JS_TCEKgysHhQhFN9sCpveYCDV0yH8bYm5MrM6QX_dM&e=
and #2746
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2746&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=koVjWc0fMKLtOSzbtHSHsbCTGZ49gAJb8cW3BVU8LVY&e=
.

Next I would play around with the integration tests of the API I've
written using REST Assured. Open DatasetsIT.java and do "Test File". As
before, you should see "Tests run: 2, Failures: 0". One of the two tests
is called "testGetDdi" and it's testing that the API endpoint to export
datasets as DDI is disabled. We'll leave it disabled until we're ready
to ship DDI support but I would recommend enabling it with curl -X PUT -d
true http://localhost:8080/api/admin/settings/:DdiExportEnabled so you
can play with it and make that "testGetDdi" integration test real. I can
help you get these tests running and will try to cover some of this in my
talk on REST Assured
https://urldefense.proofpoint.com/v2/url?u=http-3A__bl.ocks.org_pdurbin_raw_814fd29916749523db9a&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=hLpy-_TSA7Qu7gEMJgeLenVMrsNQ-6C3_XhNJhC-1Ug&e=
next week. :)

The next integration test to try is in BatchIT.java. If you haven't
already, run the curl command above to enable DDI support. Open the file
and do "Run Test". The "ensureDdiExportIsSuperuserOnlyForNow" test will
fail if DDI support is enabled but that's ok. The important thing to focus
on is the DDI being exported by the "roundTripDdi" test. The idea of this
test is that a bonafide DDI file is exported into Dataverse, persisted to
the database, and then exported as DDI, hence the round trip. The idea with
this test is that by the time the DDI export feature is complete, the input
file and the output DDI should match.

I hope this helps you get started! You'll find TODO's scattered in the
code for stuff that isn't done. And I'm sure there's more stuff in my head
that I will try to pass along.

Also, here are some questions I have about this issue that are top of mind:


Reply to this email directly or view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2579-23issuecomment-2D169094689&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=OiTcIL_jSSCUW82Gf-OSkgnZLbG2Yt87eT87BdLiP54&m=FOZkbUemLs8O8Js7OyH1Dbltn7M0zW7DzYE1rKMo-j0&s=60Gn6dMdBTwOGgB0UC2uCuBWNpVOSMXOR_e74oMT5SQ&e=
.

Eleni Castro
Research Coordinator, Data Curation and Outreach
IQSS, Harvard University
617-496-0703
http://www.iq.harvard.edu/people/eleni-castro
http://orcid.org/0000-0001-9767-8536

Got Data? Check out the Dataverse Project. http://dataverse.org/

@pdurbin
Copy link
Member Author

pdurbin commented Jan 5, 2016

@posixeleni when I asked "For DDI export via API, should the same access rules apply as for JSON export?" I was referring to the existing native JSON export we shipped in #422 as part of Dataverse 4.0. If there are JSON representations of DDI or Dublin Core I'm not aware of them. #2608 is the issue to track for having a button in the GUI to export datasets as native Dataverse JSON.

@scolapasta scolapasta modified the milestones: Not Assigned to a Release, 4.4 Jan 28, 2016
@pdurbin
Copy link
Member Author

pdurbin commented Jun 9, 2016

Please note that while DDI 2.0 is currently being validated against an XML schema, I've had trouble getting this to work with DDI 2.5.

@sekmiller ran into similar trouble. This morning we did some testing with msv.jaras documented at http://guides.dataverse.org/en/4.3.1/developers/tools.html#msv . In short DDI 2.0 is defined in a single file but DDI 2.5 seems to be spread across multiple files.

In the code base we have the DDI 2.0 schema at https://github.com/IQSS/dataverse/blob/v4.3.1/src/test/java/edu/harvard/iq/dataverse/export/ddi/Version2-0.xsd and the tests that use it are at https://github.com/IQSS/dataverse/blob/v4.3.1/src/test/java/edu/harvard/iq/dataverse/util/xml/XmlValidatorTest.java

mheppler added a commit that referenced this issue Jun 14, 2016
@mheppler
Copy link
Contributor

Cleaned up styling and layout of the Export Metadata button in the Metadata tab of the dataset pg.

@djbrooke djbrooke added ready and removed ready labels Jul 26, 2016
@djbrooke djbrooke assigned kcondon and unassigned sekmiller Aug 15, 2016
@djbrooke djbrooke assigned djbrooke and unassigned kcondon Aug 17, 2016
@djbrooke
Copy link
Contributor

@kcondon gave me some good info about how to test this. Going to give it a shot.

@djbrooke
Copy link
Contributor

Going to close this. Will open a related issue for the logic of how we handle "fundag," but that's an existing issue and unrelated to Export.

@pdurbin
Copy link
Member Author

pdurbin commented Nov 1, 2016

Will open a related issue for the logic of how we handle "fundag,"

Here's the existing issue: #3323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants