Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ROR as an Author Identifier Type #11118

Merged
merged 21 commits into from
Jan 22, 2025
Merged

add ROR as an Author Identifier Type #11118

merged 21 commits into from
Jan 22, 2025

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Dec 19, 2024

What this PR does / why we need it:

This PR adds ROR as an Author Identifier Type.

Which issue(s) this PR closes:

Special notes for your reviewer:

As of 1fe95d9 a second regex for ROR was added. Now we have "ROR" to match just the identifier and "ROR_FULL_URL" to match the full ROR URL. The latter is used when exporting the "Datacite" format.

I have a question. How can a ROR uniquely identify an author? If I use the ROR for Harvard ( https://ror.org/03vek6s52 ), how can someone know which of the thousands of Harvard affiliates I am? 🤔

Screenshot 2024-12-19 at 3 32 22 PM

Also, is it ok that ROR has been added to the bottom of the list? That's what we've always done before. Here's how it looks:

Screenshot 2024-12-19 at 3 31 46 PM

Nevermind. This was fixed in ddb6bc9. ROR is now after ORCID:

Screenshot 2024-12-20 at 11 50 35 AM

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Yes, see screenshots above.

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Preview at https://dataverse-guide--11118.org.readthedocs.build/en/11118/user/dataset-management.html#adding-a-new-dataset

@pdurbin pdurbin added Type: Feature a feature request Feature: Metadata GREI 2 Consistent Metadata FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) labels Dec 19, 2024
@pdurbin pdurbin requested review from jggautier and cmbz December 19, 2024 20:43
@qqmyers
Copy link
Member

qqmyers commented Dec 19, 2024

The intent is to allow Organizations to be authors. As you say, it would not make sense to use a ROR for a person. Eventually an external vocab script would start by letting you select Person or Org, but until then, perhaps @jggautier might want changes to the title or help tip to make things clearer?

@pdurbin
Copy link
Member Author

pdurbin commented Dec 19, 2024

@qqmyers yes, good idea. I do think the tooltip should be edited to account for organizations as authors.

@jggautier do you want me to take a whack at that? Or do you want to suggest something?

@jggautier
Copy link
Contributor

jggautier commented Dec 19, 2024

Hey @pdurbin

I have a question. How can a ROR uniquely identify an author? If I use the ROR for Harvard ( https://ror.org/03vek6s52 ), how can someone know which of the thousands of Harvard affiliates I am? 🤔

Hmmm, a ROR can uniquely identifier an author when that author is an organization, such as Harvard University.

When you ask "how can someone know which of the thousands of Harvard affiliates I am?", this makes me think that you think that the Identifier Type and Identifier fields are supposed to be used for the identifier of the author's organizational affiliation, in your example Harvard University. Is that right?

It's telling that you think that, because others might, too. During testing after implementation I'll make sure we watch out for this.

Taking your example, we're expecting that if a depositor selects ROR as the Identifier Type, that means that the author is Harvard University, not that the author's affiliation is Harvard University.

Also, is it ok that ROR has been added to the bottom of the list? That's what we've always done before.

Could you write a bit more about why we've always done that? I imagine there are some cases where we think it'll be better to list options in the list alphabetically or by how often we see people selecting them. Or in this case, since we want to encourage the use of ROR over other types of identifiers used for organizations, in the mockup at #11075 I added it below ORCID.

@pdurbin
Copy link
Member Author

pdurbin commented Dec 19, 2024

@jggautier I think we wrote at the same time. 😄 Do you want me to try editing the tooltip? Right now it only applies to people, not organizations:

Screenshot 2024-12-19 at 3 32 22 PM

@jggautier
Copy link
Contributor

Ah, I'm not sure how the Identifier Type field's tooltip applies to people and not organizations. Could you write why you think that? Or are you referring to another tooltip?

@pdurbin
Copy link
Member Author

pdurbin commented Dec 19, 2024

I'm saying we could change...

"The type of identifier that uniquely identifies the author (e.g. ORCID, ISNI)"

...to something like this:

"The type of identifier that uniquely identifies the author as a person (e.g. ORCID, ISNI) or organization (e.g. ROR)"

@jggautier
Copy link
Contributor

jggautier commented Dec 19, 2024

Ah I see. So you think that we should note in the tooltip that the author can be a person or an organization because other folks might not realize that they can add an identifier for organizational author? I think we should keep an eye on this as we get more feedback and that we shouldn't change the tooltip in this PR.

And kind of related, in our metadata text guidelines we say not to "include examples in the tooltip text for dropdown fields, since the dropdown fields already include all options". That's the 13th thing on the list. I'm surprised now that this wasn't done when we applied the guidelines to most of the fields in the citation metadata block a few years back as part of #8127.

@coveralls
Copy link

coveralls commented Dec 20, 2024

Coverage Status

coverage: 22.763% (+0.01%) from 22.751%
when pulling d94a7a2 on 11075-ror
into 69ebed2 on develop.

@pdurbin
Copy link
Member Author

pdurbin commented Dec 20, 2024

I'd say I go in thinking "Author... what is an author?... it's a person... like the author of a book... or an author of an article. I see that my name is auto-populated as the author."

"The next field, Affiliation, makes sense. What's next?" I think.

I get down to "Identifier Type" and see "The type of identifier that uniquely identifies the author (e.g. ORCID, ISNI)".

I think, "ORCID, yes, this will uniquely identify an author." (Again, I'm thinking of an author as a person.)

Then I see "ROR" in the list and I think "a ROR cannot uniquely identify a person. Why is ROR in this list of identifier types? Is this a bug?"

The "Author", "Name", and "Affiliation" tooltips all help reduce confusion by stating that an author can be an organization but the "Identifier Type" tooltip does not. So, I'm simply suggesting we fix this, not necessarily in this pull request, but at some point.

@jggautier
Copy link
Contributor

Thanks for sharing more of your thinking @pdurbin.

I've wondered too if managers of other Dataverse installations have also felt or learned that their users feel that "Author" always means "person" and never "organization, and if that's any part of the reason why some installations use the word "Creator" instead. We've seen something similar with the Related Publication field label, so we adjusted the tooltip there a while back.

I definitely plan to help with learning about and clearing up the confusion.

@pdurbin
Copy link
Member Author

pdurbin commented Dec 20, 2024

Also, is it ok that ROR has been added to the bottom of the list? That's what we've always done before.

Could you write a bit more about we've always done that? I imagine there are some cases where we think it'll be better to list options in the list alphabetically or by how often we see people selecting them. Or in this case, since we want to encourage the use of ROR over other types of identifiers used for organizations, in the mockup at #11075 I added it below ORCID.

Yes, I noticed that. I wasn't sure if it was safe to simply change the displayOrder or not. This is what I had in this PR originally, with ROR at the bottom (displayOrder=8):

Screenshot 2024-12-20 at 11 11 18 AM

However, I just went through an upgrade scenario and it looks like reloading a metadata block caused new displayOrder values to be saved without having an ill effect on older data. (I think I may have known this once but wasn't sure. Perhaps we should document it.)

Here's how I tested it. First, while running the "develop" branch, I added a couple authors with ORCID (first) and ISNI (second):

Screenshot 2024-12-20 at 11 48 34 AM

Then I edited citation.tsv to put ROR after ORCID as shown in the mockup. Since it worked (see below) I went ahead and pushed this change: 029c247

Then I reloaded citation.tsv:

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

Now ROR appears second, right after ORCID:

Screenshot 2024-12-20 at 11 50 35 AM

I checked the database and (as expected) ROR has a higher ID than other Author Identifiers (since it was added later) but the sorting looks right, looks like what we want:

dataverse=# select * from controlledvocabularyvalue where datasetfieldtype_id = 11 order by displayorder;
  id  | displayorder | identifier |   strvalue   | datasetfieldtype_id 
------+--------------+------------+--------------+---------------------
   52 |            0 |            | ORCID        |                  11
 8569 |            1 |            | ROR          |                  11
   53 |            2 |            | ISNI         |                  11
   54 |            3 |            | LCNA         |                  11
   55 |            4 |            | VIAF         |                  11
   56 |            5 |            | GND          |                  11
   57 |            6 |            | DAI          |                  11
   58 |            7 |            | ResearcherID |                  11
   59 |            8 |            | ScopusID     |                  11
(9 rows)

So, I think we're good. We'll see what people say in review and QA. 😅

Again, I'm happy to document the behavior of reloading the metadata blocks to say you can reorder controlled vocabulary values. I was happily surprised. 😄

@jggautier
Copy link
Contributor

jggautier commented Jan 2, 2025

Writing in the guides that we're able to reorder controlled vocabulary values sounds good to me. I've always assumed it was possible just by changing the number in the displayOrder column, but saying so explicitly could help others who assumed it wasn't possible or weren't sure. Let me know if I can help document that :)

@jggautier jggautier removed their request for review January 2, 2025 19:49
@cmbz cmbz added the FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) label Jan 2, 2025
@pdurbin
Copy link
Member Author

pdurbin commented Jan 7, 2025

@jggautier at standup this morning @qqmyers asked if you're ready for this to move forward and I said you are. If that's not right please pull this PR out of Ready for Review. Thanks!

@cmbz cmbz added the FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) label Jan 15, 2025

This comment has been minimized.

This comment has been minimized.

@pdurbin pdurbin removed their assignment Jan 21, 2025

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 22, 2025

@jggautier right, the established pattern is that just the identifier (e.g. 03vek6s52) should be entered under Author Identifier rather than the full URL (e.g. https://ror.org/03vek6s52).

I added a note about this to the User Guide: b9d0146 . Here's a preview:

Screenshot 2025-01-22 at 10 35 20 AM

I've adjusted the code in this PR and have written tests to clarify this. I also marked the following issue to be closed when this PR is merged (the ROR link is working now, as shown in the screenshot below):

When testing this PR, please also check the "Datacite" exporter because I had to touch that code. It should work the same as before, giving output like this...

<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org">https://orcid.org/0000-0002-9528-9470</nameIdentifier>

.. when you enter an ORCID properly, as discussed above and shown in the screenshot below.

In addition, you should now see output like this for RORs:

<nameIdentifier nameIdentifierScheme="ROR" schemeURI="https://ror.org">https://ror.org/03vek6s52</nameIdentifier>

I'm providing a screenshot below for how this looks in context.

Screenshot 2025-01-22 at 10 04 49 AM
Screenshot 2025-01-22 at 10 04 58 AM
Screenshot 2025-01-22 at 10 05 24 AM

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11075-ror
ghcr.io/gdcc/configbaker:11075-ror

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine

@ofahimIQSS
Copy link
Contributor

Testing Passed - Merging PR

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) GREI 2 Consistent Metadata Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request
Projects
Status: Done 🧹
6 participants