Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting: processing oai_dc:identifiers without Global Id protocol prefix: defaults to hdl in a potentially very inefficient way #10429

Open
landreev opened this issue Mar 26, 2024 · 0 comments

Comments

@landreev
Copy link
Contributor

landreev commented Mar 26, 2024

[the paragraph that follows need to be edited some more, as it still contains some inaccuracies; will do once I have a sec. but, to emphasize, this is a fairly exotic/uncommon issue]

Not a very common problem, but was the case with SRDA from #7624: their OAI server is supplying the record identifiers like this: 10.6141/TW-SRDA-AN010012-1 - i.e., without the doi: prefix. This is a valid doi, and resolving it, as in https://doi.org/10.6141/TW-SRDA-AN010012-1, works. However our code appears to default to hdl: (!) - and that doesn't work of course. We just need to make this configurable on the client level, which protocol to default to when the prefix is not supplied.

Upon further investigation:
It does not just default to handles; ImportGenericServiceBean has this code:

for (String otherId : otherIds) {
   try {
      HandleResolver hr = new HandleResolver();
      hr.resolveHandle(otherId);
      return HandlenetServiceBean.HDL_PROTOCOL + ":" + otherId;
    } catch (HandleException e) {
       logger.fine("Not a valid handle: " + e.toString());
    }
}

i.e., it will try to resolve every "other identifier" - i.e, identifiers that cannot be unambiguously identified as either DOI or Handle - as a handle. In IQSS production however, it makes harvesting impossible - because the attempts to resolve above take a long time (before timing out) - probably because of some firewall.

We want to be able to turn the lookup mechanism off (but we want to add it in place by default, since somebody must have included it for a reason), by being able to specify either "doi:" or "handle:", as the prefix to default to? - should be straightforward if we make this API-only functionality. (once again, this is a fairly exotic condition; never happens when harvesting from another dataverse)

@landreev landreev changed the title Harvesting: processing oai_dc:identifiers without Global Id protocol prefix: defaults to hdl; potentially very inefficient Harvesting: processing oai_dc:identifiers without Global Id protocol prefix: defaults to hdl in a potentially very inefficient way Mar 26, 2024
@DS-INRAE DS-INRAE moved this to ⚠️ Needed/Important in Recherche Data Gouv Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ⚠️ Needed/Important
Development

No branches or pull requests

1 participant