Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4964 harvesting issues #6686

Merged
merged 8 commits into from
Feb 27, 2020
Merged

4964 harvesting issues #6686

merged 8 commits into from
Feb 27, 2020

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Feb 25, 2020

What this PR does / why we need it:
This addresses a few harvesting issues from #4964:

  • Most important one: the create/edit dialog on the harvesting clients page is now going to actively push the user to select a sensible "archive type" for the remote repository. It no longer defaults to "Dataverse", and it's not going to allow the user to leave it blank. Dataverse needs to know this "archive type" in order to be able to generate redirect links for harvested objects.
  • For datasets harvested from "Generic OAI Archives", much improved redirect links. Prod. example: https://dataverse.harvard.edu/dataverse/srda_harvested - clicking on the titles in the search cards is currently redirecting to a bland OAI page on datacite.org. With the fix in this PR, it'll change to redirecting to the proper study pages via the DOI resolver.
  • Some improved labels/user messages. ("DC" is dropped from the "Generic OAI archive" label, per Julian's suggestion).
  • A somewhat limited fix for the exotic, but annoying problem discovered as we were working on the issue: (bear with me!)
    The SRDA dataverse (link above) is configured to harvest from oai.datacite.org. The archive has ~2200 sets total; and it takes forever to retrieve the whole list. The site only gives them out in chunks of 50; and as of yesterday it was taking between 40-50+ seconds for each chunk (!). So it would take ~30 min. to retrieve the whole list. In its current form harvesting client setup page needs the full set list before it can proceed to the next step and offer the user a choice of which set to harvest... Obviously, we can't make the user wait 30 min.; and something times out anyway before it's done. This means it is impossible to set up harvesting from oai.datacite.org, and impossible to edit an already configured client. (Julian must have set it up originally back when there were either fewer sets, or it wasn't taking as long to retrieve them).
    The best I could think of to work around this, I'm now checking on the progress, and if it's taken longer than a minute to retrieve the first 100 sets, I proceed to the next step, show the user the truncated list of sets, and warn them about what happened:
    Screen Shot 2020-02-25 at 6 22 58 PM

They may still not be able to select the set they want; but they may be able to set up the client with the default set, for example, and then configure the client with the set they want via the API; or they may be able to edit the schedule of an existing client... better than what's going on now.

If anyone can think of a better solution, please open a new issue for it. But I don't think it's worth investing too much energy into this. I'm not aware of another OAI archive where this is a problem.

Which issue(s) this PR closes:

Closes #4964

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change?:

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

coveralls commented Feb 25, 2020

Coverage Status

Coverage decreased (-0.01%) to 19.453% when pulling 641c73f on 4964-harvesting-issues into 74b499a on develop.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good enough for QA to me.

@TaniaSchlatter @mheppler I did add a couple comments about messaging. Tiny things, really, but you might want to take a look.

@@ -503,6 +504,7 @@ harvestclients.newClientDialog.step4=Step 4 of 4 - Display
harvestclients.newClientDialog.harvestingStyle=Archive Type
harvestclients.newClientDialog.harvestingStyle.tip=Type of remote archive.
harvestclients.newClientDialog.harvestingStyle.helptext=Select the archive type that best describes this remote server in order to properly apply formatting rules and styles to the harvested metadata as they are shown in the search results. Note that improperly selecting the type of the remote archive can result in incomplete entries in the search results, and a failure to redirect the user to the archival source of the data.
harvestclients.newClientDialog.harvestingStyle.required=Please select one of the values from the menu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
harvestclients.newClientDialog.harvestingStyle.required=Please select one of the values from the menu
harvestclients.newClientDialog.harvestingStyle.required=Please select one of the values from the menu.

I didn't run the code to see how the GUI looks but perhaps this sentence should end with a period? Apologies if I'm mistaken and some other string is concatenated or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

@@ -482,6 +482,7 @@ harvestclients.newClientDialog.oaiSets.tip=Harvesting sets offered by this OAI s
harvestclients.newClientDialog.oaiSets.noset=None
harvestclients.newClientDialog.oaiSets.helptext=Selecting "none" will harvest the default set, as defined by the server. Often this will be the entire body of content across all sub-sets.
harvestclients.newClientDialog.oaiSets.helptext.noset=This OAI server does not support named sets. The entire body of content offered by the server will be harvested.
harvestclients.newClientDialog.oaiSets.listTruncated=Please note that the remote server was taking too long to return the full list of available OAI sets, so the list was truncated! Please select a set from the current list (or select the "no set" option), and try again later if you need to change it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only thought here is that we could replace the exclamation mark with a period. This is sort of a "voice" thing. I'm not sure what the voice for the Dataverse software is or should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE NOTE THAT THE REMOTE SERVER WAS TAKING TOO LONG!!!!

caps_on

Copy link
Contributor Author

@landreev landreev Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK< DONE!!!!!!

@pdurbin pdurbin removed their assignment Feb 27, 2020
@pdurbin
Copy link
Member

pdurbin commented Feb 27, 2020

If anyone can think of a better solution

Nope. At least, I can't think of anything that doesn't significantly increase the scope of this issue. Seems like a great workaround. Kudos to @landreev .

@mheppler mheppler removed their assignment Feb 27, 2020
@kcondon kcondon self-assigned this Feb 27, 2020
@kcondon kcondon merged commit 1ec2260 into develop Feb 27, 2020
@kcondon kcondon deleted the 4964-harvesting-issues branch February 27, 2020 21:31
@djbrooke djbrooke added this to the 4.20 milestone Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Harvesting - Broken dataset title links for non-Dataverse/OAI-PMH repositories
6 participants