Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i224 Strip whitespaces from URNs when harvesting #44

Merged
merged 1 commit into from
Oct 26, 2022

Conversation

bkiahstroud
Copy link
Contributor

@bkiahstroud bkiahstroud commented Oct 24, 2022

Ref https://github.com/harvard-lts/CURIOSity/issues/224

Before

Slavery, Abolition, Emancipation and Freedom - Harvard Curiosity 2022-10-24 at 4 58 57 PM

After

Slavery, Abolition, Emancipation and Freedom - Harvard Curiosity 2022-10-24 at 4 59 17 PM

Testing Instructions

  1. Login as an admin
  2. Create a new exhibit
  3. Harvest the saef OAI set (use curiosity.yml mapping file)
  4. When harvest completes, validate that no bad URI(is not URI?): "https://nrs.harvard.edu/URN-3:FHCL.HOUGH:101211924 " errors were thrown

Copy link
Contributor

@dl-maura dl-maura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bad uri messages anymore

@dl-maura dl-maura merged commit ad73ebe into main Oct 26, 2022
@dl-maura dl-maura deleted the i224-urns-with-whitespaces-fail-to-harvest branch October 26, 2022 21:49
phil-plencner-hl added a commit that referenced this pull request Oct 31, 2022
* rename spotlight_oaipmh_harvesters to spotlight_harvesters

* add :type colum to harvesters table

The :type column will allow us to use Single Table Inheritance for the OAIPMH and Solr harvesters.

The migration also includes a data migration to add a value for :type for all existing harvesters.

* create Harvester and SolrHarvester models

Sets up Harvester as the "base" class for the two harvest types. Migrates some common logic from OaipmhHarvester to Harvester inline with this.

* repurpose OaipmhHarvesterController to handle Solr harvests as well, add HarvestType

These changes are strongly inspired by changes found in the job_entry branch, originally authored by @ives1227

* update routes to use new "base" harvester model

* port SolrHarvester logic from job_entry branch (e905c7c)

* rework HarvesterController for clarity

* add db column for solr harvest mapping filenames

* add solr option to harvester form

Builds UI that allows creation of SolrHarvesters. Combines changes from the deleted form partial with the one found in e905c7c

* locales from e905c7c

Now that solr harvests are an option, this generalizes language on the harvester form to not just refer to MODS harvests

* extract common logic into base harvester model

* bring over default solr mapping file from job_entry branch

* bring over unmodified logic relevant to solr harvests from the job_entry branch

* rename SolrHarvestingItem to SolrHarvestingParser to better reflect its purpose

* move logic from SolrHarvestingBuilder into SolrHarvester

In Spotlight v3.3.0, the builder pattern is no longer used. This first step is extracting the existing logic (untouched so far) so we can delete the builder.

* create SolrUpload model

This will be the SQL copy of items harvested by the SolrHarvester

* generalize #get_mapping_file method, remove most references to "resource" from SolrHarvester

* persist harvested solr data in SolrUploads

* rename #get_harvests to #solr_harvests

* simplify getting data from solr

* properly connect to solr url

* align harvester method names

* add job progress total logic to solr harvester

* finalize updating job tracker logic

* remove unused cursor/schedule logic

Before, every "row" of solr data was harvested in a separate job. Due to how JobTrackers work in Spotlight v3.3.0 (one tracker per job), we run the whole solr harvest in a single PerformHarvestJob

* rearrange methods in SolrHarvester

* extract single item harvesting logic to SolrHarvester#harvest_item

* generalize common error handling logic

* add @sidecar_id tracking to SolrHarvester

* fix namespace errors, simplify unique_id_field logic, remove unused #get_unique_id_field_name method

* override SolrUpload#compound_id

This allows us to connect URNs with the correct SolrDocuments

* custom metadata gets indexed on initial solr harvest

* fix updating existing solr harvest items' metadata

* upgraded everything let's see how it goes (#42)

* account for missing trailing "/" in base_url

* fix undefined error_msg bug

* Update jbsd.yml (#45)

* strip whitespaces from URNs when harvesting (#44)

* Update email template (#46)

* Update email per VV

* Update en.yml

* Fixing bug with set name in title. Fixing typo.

* Change the subject line

Co-authored-by: Maura C <[email protected]>

* Updating version to 3.0.0-beta.12. (#47)

* implement cursor search for solr harvesters

Replaces paginated search, which was failing due to the Solr server's restrictions

* Update solr_harvester.rb

* Update solr_harvester.rb

* add config file to declare the unique key for each Solr set

This is meant to be a temporary solution due to data structure inconsistencies between the Solr sets; specifically, the fact that they currently use different fields as the unique key

* Bumping version to 3.0.0-beta.13.

Co-authored-by: dl-maura <[email protected]>
Co-authored-by: Phil Plencner <[email protected]>
Co-authored-by: Phil Plencner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants