For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

scolapasta · 2024-10-18T15:13:28Z

Currently, havrvesting uses ListIdentifers then called GetIdentifier for each record.

However, the DataCite OAI only supports sets via ListRecords. So we want to allow a harvesting client (on our side) to use ListRecords so we can harvest Sets from DataCite.

(This would go away of we DataCite begins to support sets with ListIdentifiers)

scolapasta · 2024-10-18T15:15:49Z

As a quick "hack" we could just get the list of identifiers from list records, and then still call GetIdentifier for each identifer. This, of course, would result in getting double in the info from DataCite, but would be a fairly quick thing to add.

The more complete solution would be to use the data as well from ListRecords. In this case, we would have to make sure to handle the case where one (or more) harvest fails, while still allowing the others to succeed - the reason we currently split the calls up via GetIdentifier in the first place.

scolapasta · 2024-10-18T15:42:07Z

Sized as a 10 with the idea that we would start with the quick fix; we should resize appropriately if we decide we want to use the 2nd approach instead.

landreev · 2024-10-18T20:47:51Z

I just proposed in a direct slack to close this issue, since it's a duplicate of 1/2 of #10909. The 2 items under #10909 can of course be separately implemented; but since the overall delivery of being able to harvest from DataCite appears to be an urgent/asap task, splitting the 2 does not really offer any benefit. Testing the 2 things separately would be a bit convoluted however and has some potential for creating unnecessary overhead.

Also, I am not super comfortable with the ListRecords -> GetRecord solution. The common sense test here is that I would be upset if somebody were harvesting from my servers like that. I would rather implement it properly. It's not going to be a 10 for sure, but should not be an insane amount of work either.

landreev · 2024-10-22T13:14:09Z

Closing as a duplicate of #10909.

scolapasta added this to IQSS Dataverse Project Oct 18, 2024

scolapasta moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Oct 18, 2024

scolapasta mentioned this issue Oct 18, 2024

Allow Harvesting to use arbitrary sets #10937

Closed

scolapasta added the Size: 10 A percentage of a sprint. 7 hours. label Oct 18, 2024

scolapasta moved this from SPRINT- NEEDS SIZING to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Oct 18, 2024

scolapasta mentioned this issue Oct 18, 2024

Add support for OAI-harvesting from DataCite #10909

Open

cmbz added GREI 3 Search and Browse NIH CAFE Issues related to and/or funded by the NIH CAFE project labels Oct 20, 2024

This was referenced Oct 20, 2024

Project: NIH CAFE IQSS/dataverse-pm#161

Open

GREI 3: HDV Task - Improve OAI-PMH Harvesting IQSS/dataverse-pm#171

Open

landreev removed the Size: 10 A percentage of a sprint. 7 hours. label Oct 22, 2024

landreev closed this as completed Oct 22, 2024

github-project-automation bot moved this from This Sprint 🏃‍♀️ 🏃 to Done 🧹 in IQSS Dataverse Project Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

scolapasta commented Oct 18, 2024

scolapasta commented Oct 18, 2024

scolapasta commented Oct 18, 2024

landreev commented Oct 18, 2024

landreev commented Oct 22, 2024

For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

For harvesting, allow advanced feature of using listRecords vs ListIdentifiers #10936

Comments

scolapasta commented Oct 18, 2024

scolapasta commented Oct 18, 2024

scolapasta commented Oct 18, 2024

landreev commented Oct 18, 2024

landreev commented Oct 22, 2024