Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhibit-specific fields work with harvesting #35

Merged

Conversation

bkiahstroud
Copy link
Contributor

@bkiahstroud bkiahstroud commented Sep 15, 2022

Ref https://github.com/harvard-lts/CURIOSity/issues/170

Summary

Fixes a bug where exhibit-specific fields were not getting populated with data when harvesting.

Testing Instructions

Follow Vanessa's steps here.

Expected behavior: the exhibit-specific fields populate with data when harvesting.

Explanation

Short version

The incoming metadata was not being formatted in the way that Spotlight expects, leading to the bug. Exhibit-specific fields and default "configured" fields are now formatted as required.

Long version

I made several discoveries while troubleshooting this issue:

1. As far as I can tell, the #data in a Spotlight::Resources::OaipmhUpload has no bearing on the data that gets persisted in Solr

This is partially expected; the resource's associated sidecar's (Spotlight::SolrDocumentSidecar) #data is what goes to Solr, which is expected. But updating the resource's #data (after initial creation) does not automatically update its sidecar; instead, the sidecar's #data also has to be updated manually.

Example scenarios

Resource creation, works as expected

1. Initialize a resource with data
2. Save the resource
3. The resource's sidecar is created at save time with the resource's data 
4. The sidecar's data gets indexed into Solr

Resource update (attempt to update data in Solr through resource), does not work as expected

1. Find existing resource
2. Update resource's #data 
3. Run #save_and_index on resource 
4. Resource updated successfully 
5. Resource's sidecar doesn't change 
6. Unchanged sidecar gets indexed 
7. No data changed in Solr 

Resource update (attempt to update data in Solr through resource's sidecar), succeeds

1. Find existing resource 
2. Update resource's #data
3. Update resource's sidecar's #data
4. Run #save_and_index on resource 
5. Updated sidecar gets indexed 
6. Updated data gets persisted in Solr

This is further supported by the fact that if you edit an resource in the UI, it "works" but the resource remains unchanged; the resource's sidecar is what changes!

It feels like the resource only gets used at create time and is vestigial after that.

2. OaipmhUpload#data and SolrDocumentSidecar#data expect different formats

Example:

# Correct OaipmhUpload#data format
{
  'full_title_tesim' => 'Title',
  'custom-field' => 'Hello world'
}

# Correct SolrDocumentSidecar#data format
{
  'configured_fields' => {
    'full_title_tesim' => 'Title'
  },
  'custom-field' => 'Hello world'
}

3. Exhibit-specific fields ("custom fields" in the code) are expected to not have a Solr suffix in the #data hashes

# Bad
'custom-field_tesim'

# Good
'custom-field'

Comment on lines 88 to +92
sidecar = resource.solr_document_sidecars.first
sidecar.data['configured_fields'].merge!(resource.data)
sidecar.data = parsed_oai_item.organize_item_sidecar_data
sidecar.save!
# Get the updated sidecar data into our local variable
resource.solr_document_sidecars.map(&:reload)
# Get the updated sidecar into our local variable to ensure proper indexing
resource.reload.reindex_later
Copy link
Contributor Author

@bkiahstroud bkiahstroud Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to #1 under the "Long version" section of this PR's description for an explanation

# }
#
# @return [Hash] Sidecar data organized in the format that Spotlight expects
def organize_item_sidecar_data
Copy link
Contributor Author

@bkiahstroud bkiahstroud Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to #2 under the "Long version" section of this PR's description for an explanation

# a Solr suffix (e.g. _tesim, _ssim, etc.). This method assumes all non-configured fields
# are custom and thus removes their Solr suffix when adding them to the @item_sidecar hash.
# Configured fields are added as-is (Solr suffix included).
def assign_item_sidecar_data(field_name, value)
Copy link
Contributor Author

@bkiahstroud bkiahstroud Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to #3 under the "Long version" section of this PR's description for an explanation

Adjusting REGEXP to be less "greedy" in consuming characters.
Copy link
Contributor

@phil-plencner-hl phil-plencner-hl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phil-plencner-hl phil-plencner-hl merged commit a7ae43c into main Sep 16, 2022
@phil-plencner-hl phil-plencner-hl deleted the i170-exhibit-specific-fields-work-with-harvesting branch September 16, 2022 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants