-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149
Comments
FWIW:
If https://ror.org/ is removed from the pattern, which is what's needed to recognize the non-url form, we should be aware of the impact in fields such as author affiliation or funding agency (places where we don't have a separate id type field.) in code like dataverse/src/main/java/edu/harvard/iq/dataverse/pidproviders/doi/XmlMetadataTemplate.java Lines 1531 to 1533 in 4373753
Not sure what the best approach is, especially given that we're trying to get ext vocab scripts in to avoid typos/variations in identifiers. Just removing the https://ror.org/ from the template instead might be a quick compromise - it would fix the URL form and leave the any non URL entry as a plain string, which would be OK if/when we update to use the script for ROR entry for an author. |
In this issue's PR, starting at #11118 (comment), there's more conversation about how this should work. |
I'm marking the following pull request as fixing this issue: In that PR adjusted the code so that there are now two regexes. One for just the ROR identifier and one for the full ROR URL. The latter is used when exporting the "Datacite" format. |
Description
While testing the pull request #11118, an issue was identified with the handling of ROR (Research Organization Registry) identifiers in the Dataverse system. Specifically, the system generates incorrect links when saving and displaying ROR identifiers in the dataset metadata.
This brings to the discussion of how all Identifiers should be handled. Specifically, we need to determine whether it is sufficient to ask for just the Unique Identifier or if it is necessary to request the entire URL for many of these identifier types. Factors to consider include the consistency and accuracy of data, ease of implementation, system compatibility, and user experience. Using only the Unique Identifier might streamline data entry and reduce redundancy, but it could introduce challenges in cases where context or full URL information is required for processing. On the other hand, requiring the entire URL could ensure completeness and facilitate integration with systems that rely on full URLs but may add complexity and potential for errors during data entry. Establishing a clear guideline will help maintain uniformity and efficiency across all identifier types.
Steps to Reproduce
Follow the steps outlined in PR #11118.
Create a new dataset and proceed to the "Author" section:
Under Identifier Type, select ROR.
Enter a valid ROR URL, e.g., https://ror.org/03vek6s52.
Save the dataset.
Navigate to the Metadata tab and click on the displayed ROR URL.
Observed Behavior
The ROR URL redirects to an invalid link:
https://ror.org/https://ror.org/03vek6s52.
This results in a 404 Page Not Found error due to duplication of the domain (ror.org).
Test with only the ROR Identifier (e.g., 03vek6s52):
Enter just the identifier (without the full URL).
Save the dataset and navigate to the Metadata tab.
The ROR Identifier is displayed as plain text and is not hyperlinked.
Expected Behavior
When a valid ROR URL is entered, the metadata tab should display and link to the correct ROR page, e.g., https://ror.org/03vek6s52.
When only the ROR Identifier is provided, the system should construct a valid URL (https://ror.org/{identifier}) and display it as a clickable hyperlink in the metadata tab.
Screen.Recording.2025-01-10.at.3.00.12.PM.mov
The text was updated successfully, but these errors were encountered: