Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

Closed
scolapasta opened this issue Oct 18, 2024 · 4 comments
Closed
Labels
GREI 3 Search and Browse NIH CAFE Issues related to and/or funded by the NIH CAFE project

Comments

@scolapasta
Copy link
Contributor

Currently, havrvesting uses ListIdentifers then called GetIdentifier for each record.

However, the DataCite OAI only supports sets via ListRecords. So we want to allow a harvesting client (on our side) to use ListRecords so we can harvest Sets from DataCite.

(This would go away of we DataCite begins to support sets with ListIdentifiers)

@scolapasta
Copy link
Contributor Author

As a quick "hack" we could just get the list of identifiers from list records, and then still call GetIdentifier for each identifer. This, of course, would result in getting double in the info from DataCite, but would be a fairly quick thing to add.

The more complete solution would be to use the data as well from ListRecords. In this case, we would have to make sure to handle the case where one (or more) harvest fails, while still allowing the others to succeed - the reason we currently split the calls up via GetIdentifier in the first place.

@scolapasta scolapasta moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Oct 18, 2024
@scolapasta scolapasta added the Size: 10 A percentage of a sprint. 7 hours. label Oct 18, 2024
@scolapasta scolapasta moved this from SPRINT- NEEDS SIZING to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Oct 18, 2024
@scolapasta
Copy link
Contributor Author

Sized as a 10 with the idea that we would start with the quick fix; we should resize appropriately if we decide we want to use the 2nd approach instead.

@landreev
Copy link
Contributor

I just proposed in a direct slack to close this issue, since it's a duplicate of 1/2 of #10909. The 2 items under #10909 can of course be separately implemented; but since the overall delivery of being able to harvest from DataCite appears to be an urgent/asap task, splitting the 2 does not really offer any benefit. Testing the 2 things separately would be a bit convoluted however and has some potential for creating unnecessary overhead.

Also, I am not super comfortable with the ListRecords -> GetRecord solution. The common sense test here is that I would be upset if somebody were harvesting from my servers like that. I would rather implement it properly. It's not going to be a 10 for sure, but should not be an insane amount of work either.

@cmbz cmbz added GREI 3 Search and Browse NIH CAFE Issues related to and/or funded by the NIH CAFE project labels Oct 20, 2024
@landreev landreev removed the Size: 10 A percentage of a sprint. 7 hours. label Oct 22, 2024
@landreev
Copy link
Contributor

Closing as a duplicate of #10909.

@github-project-automation github-project-automation bot moved this from This Sprint 🏃‍♀️ 🏃 to Done 🧹 in IQSS Dataverse Project Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 3 Search and Browse NIH CAFE Issues related to and/or funded by the NIH CAFE project
Projects
Status: Done 🧹
Development

No branches or pull requests

3 participants