Skip to content

Commit

Permalink
Merge branch 'develop' into 10853-6.4-release-notes #10853
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Sep 23, 2024
2 parents b9f1c85 + edae760 commit 5e175c6
Show file tree
Hide file tree
Showing 44 changed files with 2,527 additions and 769 deletions.
2 changes: 2 additions & 0 deletions conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@
<field name="productionPlace" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publication" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationCitation" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationRelationType" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationIDNumber" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationIDType" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationURL" type="text_en" multiValued="true" stored="true" indexed="true"/>
Expand Down Expand Up @@ -593,6 +594,7 @@
<copyField source="productionPlace" dest="_text_" maxChars="3000"/>
<copyField source="publication" dest="_text_" maxChars="3000"/>
<copyField source="publicationCitation" dest="_text_" maxChars="3000"/>
<copyField source="publicationRelationType" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDNumber" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDType" dest="_text_" maxChars="3000"/>
<copyField source="publicationURL" dest="_text_" maxChars="3000"/>
Expand Down
41 changes: 41 additions & 0 deletions doc/release-notes/10632-DataCiteXMLandRelationType.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
### Enhanced DataCite Metadata, Relation Type

A new field has been added to the citation metadatablock to allow entry of the "Relation Type" between a "Related Publication" and a dataset. The Relation Type is currently limited to the most common 6 values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed.

Dataverse now supports the DataCite v4.5 schema. Additional metadata, including metadata about Related Publications, and files in the dataset are now being sent to DataCite and improvements to how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata is represented have been made. The enhanced metadata will automatically be sent when datasets are created and published and is available in the DataCite XML export after publication.

The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see https://github.com/IQSS/dataverse/pull/10632 and https://github.com/IQSS/dataverse/pull/10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there.

Multiple backward incompatible changes and bug fixes have been made to API calls (3 of the four of which were not documented) related to updating PID target urls and metadata at the provider service:
- [Update Target URL for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-a-published-dataset-at-the-pid-provider)
- [Update Target URL for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-all-published-datasets-at-the-pid-provider)
- [Update Metadata for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-a-published-dataset-at-the-pid-provider)
- [Update Metadata for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-all-published-datasets-at-the-pid-provider)

Upgrade instructions
--------------------

The Solr schema has to be updated via the normal mechanism to add the new "relationType" field.

The citation metadatablock has to be reinstalled using the standard instructions.

With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite.

To update existing datasets (and files using DataCite DOIs):

Exports can be updated by running `curl http://localhost:8080/api/admin/metadata/reExportAll`

Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):

`curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll`

This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using

`grep 'Failure for id' server.log`

Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the [Reserve a PID](https://guides.dataverse.org/en/latest/api/native-api.html#reserve-a-pid) API call and the newly documented `/api/datasets/<id>/modifyRegistration` call respectively. See https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.

PIDs can also be updated by a superuser on a per-dataset basis using

`curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata`

35 changes: 32 additions & 3 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,41 @@ Mints a new identifier for a dataset previously registered with a handle. Only a
.. _send-metadata-to-pid-provider:

Send Dataset metadata to PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Update Target URL for a Published Dataset at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forces update to metadata provided to the PID provider of a published dataset. Only accessible to superusers. ::
Forces update to the target URL provided to the PID provider of a published dataset and assures the PID is findable.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$dataset-id/modifyRegistration
Update Target URL for all Published Datasets at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forces update to the target URL provided to the PID provider of all published datasets and assures the PID is findable.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/modifyRegistrationAll
Update Metadata for a Published Dataset at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Checks to see that the PID metadata for a published dataset (and any released files in it using file PIDs)
is up-to-date at the provider and updates the metadata if necessary.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$dataset-id/modifyRegistrationMetadata
Update Metadata for all Published Datasets at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Checks to see that the PID metadata is up-to-date at the provider for all published datasets
(and any released files in them using file PIDs) and updates the metadata if necessary.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/modifyRegistrationPIDMetadataAll
The call returns 200/OK as long as the call completes. Any errors for individual datasets are reported in the log.

Check for Unreserved PIDs and Reserve Them
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ This API changelog is experimental and we would love feedback on its usefulness.
:local:
:depth: 1

v6.4
----

- **/api/datasets/$dataset-id/modifyRegistration**: Changed from GET to POST
- **/api/datasets/modifyRegistrationPIDMetadataAll**: Changed from GET to POST


v6.3
----

Expand Down
41 changes: 40 additions & 1 deletion doc/sphinx-guides/source/developers/version-control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,12 @@ In the issue you can simply leave a comment to say you're working on it.

If you tell us your GitHub username we are happy to add you to the "read only" team at https://github.com/orgs/IQSS/teams/dataverse-readonly/members so that we can assign the issue to you while you're working on it. You can also tell us if you'd like to be added to the `Dataverse Community Contributors spreadsheet <https://docs.google.com/spreadsheets/d/1o9DD-MQ0WkrYaEFTD5rF_NtyL8aUISgURsAXSL7Budk/edit?usp=sharing>`_.

.. _create-branch-for-pr:

Create a New Branch Off the develop Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing (e.g. `#3728 <https://github.com/IQSS/dataverse/issues/3728>`_) and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells.
Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing (e.g. `#3728 <https://github.com/IQSS/dataverse/issues/3728>`_) and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells. Please do not call your branch "develop" as it can cause maintainers :ref:`trouble <develop-into-develop>`.

Commit Your Change to Your New Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -299,3 +301,40 @@ GitHub documents how to make changes to a fork at https://help.github.com/articl
vim path/to/file.txt
git commit
git push OdumInstitute 4709-postgresql_96
.. _develop-into-develop:

Handing a Pull Request from a "develop" Branch
----------------------------------------------

Note: this is something only maintainers of Dataverse need to worry about, typically.

From time to time a pull request comes in from a fork of Dataverse that uses "develop" as the branch behind the PR. (We've started asking contributors not to do this. See :ref:`create-branch-for-pr`.) This is problematic because the "develop" branch is the main integration branch for the project. (See :ref:`develop-branch`.)

If the PR is perfect and can be merged as-is, no problem. Just merge it. However, if you would like to push commits to the PR, you are likely to run into trouble with multiple "develop" branches locally.

The following is a list of commands oriented toward the simple task of merging the latest from the "develop" branch into the PR but the same technique can be used to push other commits to the PR as well. In this example the PR is coming from username "coder123" on GitHub. At a high level, what we're doing is working in a safe place (/tmp) away from our normal copy of the repo. We clone the main repo from IQSS, check out coder123's version of "develop" (called "dev2" or "false develop"), merge the real "develop" into it, and push to the PR.

If there's a better way to do this, please get in touch!

.. code-block:: bash
# do all this in /tmp away from your normal code
cd /tmp
git clone [email protected]:IQSS/dataverse.git
cd dataverse
git remote add coder123 [email protected]:coder123/dataverse.git
git fetch coder123
# check out coder123's "develop" to a branch with a different name ("dev2")
git checkout coder123/develop -b dev2
# merge IQSS "develop" into coder123's "develop" ("dev2")
git merge origin/develop
# delete the IQSS "develop" branch locally (!)
git branch -d develop
# checkout "dev2" (false "develop") as "develop" for now
git checkout -b develop
# push the false "develop" to coder123's fork (to the PR)
git push coder123 develop
cd ..
# delete the tmp space (done! \o/)
rm -rf /tmp/dataverse
12 changes: 8 additions & 4 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,10 @@ Dataverse can be configured with one or more PID providers, each of which can mi
to manage an authority/shoulder combination, aka a "prefix" (PermaLinks also support custom separator characters as part of the prefix),
along with an optional list of individual PIDs (with different authority/shoulders) than can be managed with that account.

Dataverse automatically manages assigning PIDs and making them findable when datasets are published. There are also :ref:`API calls that
allow updating the PID target URLs and metadata of already-published datasets manually if needed <send-metadata-to-pid-provider>`, e.g. if a Dataverse instance is
moved to a new URL or when the software is updated to generate additional metadata or address schema changes at the PID service.

Testing PID Providers
+++++++++++++++++++++

Expand All @@ -246,11 +250,11 @@ configure the credentials as described below.

Alternately, you may wish to configure other providers for testing:

- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite.
- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite.

- The PermaLink provider, like the FAKE DOI provider, does not involve an external account.
Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs,
and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.
- The PermaLink provider, like the FAKE DOI provider, does not involve an external account.
Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs,
and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.

Provider-specific configuration is described below.

Expand Down
6 changes: 6 additions & 0 deletions scripts/api/data/dataset-create-new-all-default-fields.json
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,12 @@
"typeClass": "compound",
"value": [
{
"publicationRelationType" : {
"typeName" : "publicationRelationType",
"multiple" : false,
"typeClass" : "controlledVocabulary",
"value" : "IsSupplementTo"
},
"publicationCitation": {
"typeName": "publicationCitation",
"multiple": false,
Expand Down
Loading

0 comments on commit 5e175c6

Please sign in to comment.