Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dc:creator in resource map properly #116

Closed
gothub opened this issue Jul 8, 2020 · 2 comments
Closed

Handle dc:creator in resource map properly #116

gothub opened this issue Jul 8, 2020 · 2 comments
Assignees
Labels
Milestone

Comments

@gothub
Copy link
Contributor

gothub commented Jul 8, 2020

When updating a resource map via uploadDataPackage, the RDF triple containing dc:creator is not handled properly.
Here is an example, with PISCO resourceMap_marine_ltm.9.2 (created by the DataONE Java client library) as the existing resmap and resourceMap_marine_ltm.9.3 as the improperly serialized map (created by R dataone):
resourceMap_marine_ltm.9.2:

  <rdf:Description rdf:about="https://cn.dataone.org/cn/v1/resolve/resourceMap_marine_ltm.9.2">
    <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2018-03-01T12:39:18.598-08:00</dcterms:modified>
    <ore:describes rdf:resource="https://cn.dataone.org/cn/v1/resolve/resourceMap_marine_ltm.9.2#aggregation"/>
    <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/ResourceMap"/>
    <dc:creator rdf:nodeID="A0"/>
    <dcterms:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">resourceMap_marine_ltm.9.2</dcterms:identifier>
  </rdf:Description>

resourceMap_marine_ltm.9.3 (just the dc:creator triple)

  <rdf:Description rdf:about="resourceMap_marine_ltm.9.2">
    <dc:creator rdf:nodeID="r1593203265r10816r1"/>
  </rdf:Description>

This latter triple causes DataONE indexing (Java Jena) to throw an exception, as the subject should be a URI (the DataONE resolve URL has been improperly stripped out), and should instead be:

  <rdf:Description rdf:about="https://cn.dataone.org/cn/v1/resolve/resourceMap_marine_ltm.9.3">
    <dc:creator rdf:nodeID="r1593203265r10816r1"/>
  </rdf:Description>

First of all, the Java client library is using 'dc:creator', which should be 'dcterms:creator'.
The best solution is to remove the triple with 'dc:creator', as the R client already puts dcterms:creator in.

The R client replaces the blank node elements from original dc:creator when it creates these triples for the dcterms:creator. Here is the original and the new:
resourceMap_marine_ltm.9.2(from Java client):

  <rdf:Description rdf:nodeID="A0">
    <foaf:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DataONE Java Client Library</foaf:name>
    <rdf:type rdf:resource="http://purl.org/dc/terms/Agent"/>
  </rdf:Description>

resourceMap_marine_ltm.9.3 (from R dataone):

  <rdf:Description rdf:about="https://cn.dataone.org/cn/v2/resolve/resourceMap_marine_ltm.9.3">
    <dcterms:creator rdf:nodeID="_287a3ffd-f1db-46e0-840e-7625eb96918b"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="_287a3ffd-f1db-46e0-840e-7625eb96918b">
    <rdf:type rdf:resource="http://purl.org/dc/terms/Agent"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="_287a3ffd-f1db-46e0-840e-7625eb96918b">
    <foaf:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DataONE R Client</foaf:name>
  </rdf:Description>

... so this doesn't need to change.

All that needs to happen is for the R dataone client (datapack) to drop the dc:creator triple.

@gothub gothub self-assigned this Jul 8, 2020
@gothub gothub added the bug label Jul 8, 2020
@gothub gothub added this to the 1.3.3 milestone Jul 8, 2020
@gothub
Copy link
Contributor Author

gothub commented Aug 26, 2020

Fixed in commit eed92cc

@gothub gothub closed this as completed Aug 26, 2020
@gothub gothub modified the milestones: 1.3.3, 1.4.0 Oct 22, 2020
@mbjones
Copy link
Member

mbjones commented Jul 2, 2021

For posterity, we should be using the dcterms vocabulary defined by the http://purl.org/dc/terms/ namespace, and not use the historical elementset namespace (aka dc11 defined at http://purl.org/dc/elements/1.1/) at all. This is because 1) the dcterms terms are the more modern definition and include all of the elements and more, and 2) where there are identical concepts in terms and elements, the terms concept is defined as a subproperty of the element concept. So, for example, dcterms:creator rdfs:supPropertyOf dc11:creator. So, inferencing agents can use terms concepts anywhere a dc11:creator is expected, and queries will resolve both. Which is not true in the opposite direction. StackOverflow has a great summary of these two namespaces: https://stackoverflow.com/a/47523514/4200841

That said, our parsers should be robust enough to not balk if additional properties from any other namespace are encountered. So there is likely an indexer bug in here as well, in that it is too highly sensitive to the presence of extra information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants