-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a data repository, I need to harvest additional metadata in OAI_DC records #4176
Comments
There appears to be several things going on: The "dc:description" field: It looks like in 3.0 we used to populate this field with BOTH the content of the "description" field from the dataset (study) metadata; AND the citation. And in 4.* we are only exporting the description. Interestingly enough, you are the first person to notice this (this maybe because most of our users use the DDI for harvesting?). I don't know if it was dropped on purpose and if there was some specific reason for it (?). We'll have this reviewed by those on our team who normally handle all things metadata and exports (@jggautier ?). But technically, this would be a trivial fix, to put the citation text back there. "dc:relation" field: Similarly, it looks like back in 3.0 we were packing several different metadata elements into this field as well:
In 4.* we are only exporting the contents of the "relatedDatasets" metadata field as "dc:relation". (In your case, in the example above, what 3.0 metadata field did that text come from?). Again, we'll need to review this, if it's just a matter of using this field to export some extra 4.* metadata fields... "dc:rights" field: OK, this is definitely a bug on our part; we simply dropped it from our OAI_DC exports, seemingly by mistake. Note that we are still exporting it as part of "DCTERMS" - the "extended DC". OK, in case this is already confusing: in 4.0 we export the metadata in TWO different DC formats: the original, 15 field Dublin Core, that is used in OAI harvesting ("OAI_DC"); and as "DCTERMS" - the extended DC, with the 15 original + 40 (?) extra fields total. This is the format you get when you go to the "Metadata" tab on your dataset page, then click on "Export Metadata" and "Dublin Core". (Please try this with the dataset above; you should be getting a record with the "dcterms:rights" field in it. "dc:coverage" field: This is also one of the 15 base DC fields. Yes, we were using this field for both the time and geo coverage in 3.0. In 4.* we appear to have switched to exporting these metadata values as "temporal" and "spatial" fields. However, both of these are extended, DCTERMS-only fields. So this is why they are missing in the harvestable, OAI_DC records. |
I'm looking at other repositories, reading about and have emailed a few people about best practices, but I think right now that dc:rights, dc:coverage and dc:relation should be re-added in the ways described below. Description for dataset citation
Rights
or
Text in the Terms of Access, Availability Status and Contact for Access fields should also each go in their own dc:rights elements, starting with the name of the field:
This is an addition to the Dataverse fields mapped to dc:rights in this Dataverse 3 crosswalk. Coverage for geospatial
Each geographic location should be in its own element.
I removed the prepended "Country/Nation" text because if a city, state and/or "Other" is included with the country/nation in the same compound field, starting with "Country/Nation" won't make sense when all fields in that compound field indicating one geographic location should be concatenated in one dc:coverage element:
From what I've seen, the fields should be ordered most specific to general, from left to right, so City, State, Country/Nation, Other. (The Dublin Core metadata you can export from a dataset page, which uses dcterms, puts the text in each field of one compound field in its own I think this should be changed so that all fields in the compound field are concatenated in one
Coverage for dates
Relation
And in the above example OAI_DC record, the dc:relation text came from the "Other References" field in Dataverse 3 (of this dataset), which exists in Dataverse 4. Related publication is a compound field in 4.x (judging by this Dataverse 3.x metadata crosswalk, it was one free text field in Dataverse 3.x). I think any fields within it should be in one dc:relation element:
Text in the other three fields should go in their own dc:relation elements.
|
@landreev @jggautier good details on this - thank you. Can one of you leave a comment with a short list of the specific changes we should make so that we can get this estimated and into a sprint? |
Here's as short and as (hopefully not unnecessarily) specific as I can get it. In the simplified Dublin Core export:
|
I'm adding dc:rights in the following pull request: Please feel free to leave a review to tell me if you think I'm adding it correctly! 😅 |
Thanks @pdurbin. I'm spinning up an AWS instance of your branch so I can learn more about how dc:rights "is mapped (when available) to terms of use, restrictions, and license". |
2024/09/09: Keeping open for now. |
The pull request at #10737 won't add dc:rights. @pdurbin and I figured it would be better to tackle dc:rights in other efforts. And it's related to #8129 and #5920. In a comment in this GitHub issue back in 2017 I wrote about other information that we might consider adding to the OAI_DC records. Our next steps might be to:
|
We migrated from 3.0 to 4.7 and when we try to retrieve oai data using
oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=[identfierID]
we are missing some elements. As is shown on the screenshots, particularly. dc:relation
· dc:description (citation)
· dc:coverage (time and geographic)
· dc:rights
are not there in the new version which existed in the 3.0
We wonder if we should configure something to get the same response, in order to not affect external systems that are harvesting our metadata as they are currently exposed by our Dataverse 3.0.
v4.7
v3.0
The text was updated successfully, but these errors were encountered: