Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

Closed
ofahimIQSS opened this issue Jan 10, 2025 · 3 comments · Fixed by #11118
Closed

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

ofahimIQSS opened this issue Jan 10, 2025 · 3 comments · Fixed by #11118
Labels
Type: Bug a defect
Milestone

Comments

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Jan 10, 2025

Description
While testing the pull request #11118, an issue was identified with the handling of ROR (Research Organization Registry) identifiers in the Dataverse system. Specifically, the system generates incorrect links when saving and displaying ROR identifiers in the dataset metadata.

This brings to the discussion of how all Identifiers should be handled. Specifically, we need to determine whether it is sufficient to ask for just the Unique Identifier or if it is necessary to request the entire URL for many of these identifier types. Factors to consider include the consistency and accuracy of data, ease of implementation, system compatibility, and user experience. Using only the Unique Identifier might streamline data entry and reduce redundancy, but it could introduce challenges in cases where context or full URL information is required for processing. On the other hand, requiring the entire URL could ensure completeness and facilitate integration with systems that rely on full URLs but may add complexity and potential for errors during data entry. Establishing a clear guideline will help maintain uniformity and efficiency across all identifier types.

Steps to Reproduce
Follow the steps outlined in PR #11118.
Create a new dataset and proceed to the "Author" section:
Under Identifier Type, select ROR.
Enter a valid ROR URL, e.g., https://ror.org/03vek6s52.
Save the dataset.
Navigate to the Metadata tab and click on the displayed ROR URL.
Observed Behavior
The ROR URL redirects to an invalid link:
https://ror.org/https://ror.org/03vek6s52.
This results in a 404 Page Not Found error due to duplication of the domain (ror.org).

Test with only the ROR Identifier (e.g., 03vek6s52):

Enter just the identifier (without the full URL).
Save the dataset and navigate to the Metadata tab.
The ROR Identifier is displayed as plain text and is not hyperlinked.
Expected Behavior
When a valid ROR URL is entered, the metadata tab should display and link to the correct ROR page, e.g., https://ror.org/03vek6s52.
When only the ROR Identifier is provided, the system should construct a valid URL (https://ror.org/{identifier}) and display it as a clickable hyperlink in the metadata tab.

Screen.Recording.2025-01-10.at.3.00.12.PM.mov
@ofahimIQSS ofahimIQSS added the Type: Bug a defect label Jan 10, 2025
@pdurbin pdurbin removed their assignment Jan 10, 2025
@ofahimIQSS ofahimIQSS changed the title Incorrect Handling of ROR Identifiers in Dataset Metadata Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata Jan 10, 2025
@qqmyers
Copy link
Member

qqmyers commented Jan 10, 2025

FWIW:

ROR("ROR", "https://ror.org/%s", "^(https:\\/\\/ror.org\\/)0[a-hj-km-np-tv-z|0-9]{6}[0-9]{2}$");
has a bug in that the https://ror.org/ shouldn't be in both the template and the pattern.

If https://ror.org/ is removed from the pattern, which is what's needed to recognize the non-url form, we should be aware of the impact in fields such as author affiliation or funding agency (places where we don't have a separate id type field.) in code like

ExternalIdentifier externalIdentifier = ExternalIdentifier.ROR;
if (externalIdentifier.isValidIdentifier(funder)) {
isROR = true;
. The external vocab script stores the URL form and expects it to be recognized there. If there's a need to recognize ROR w/o the URL, it might be easiest to have a lax and strict ROR recognizers, but being able to recognize both forms at the same time might be a nice upgrade (for ROR and other IDs).

Not sure what the best approach is, especially given that we're trying to get ext vocab scripts in to avoid typos/variations in identifiers. Just removing the https://ror.org/ from the template instead might be a quick compromise - it would fix the URL form and leave the any non URL entry as a plain string, which would be OK if/when we update to use the script for ROR entry for an author.

@jggautier
Copy link
Contributor

In this issue's PR, starting at #11118 (comment), there's more conversation about how this should work.

@pdurbin
Copy link
Member

pdurbin commented Jan 22, 2025

I'm marking the following pull request as fixing this issue:

In that PR adjusted the code so that there are now two regexes. One for just the ROR identifier and one for the full ROR URL. The latter is used when exporting the "Datacite" format.

@pdurbin pdurbin added this to the 6.6 milestone Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants