Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider caching the results of "related items" SOLR searches #2832

Open
eddierubeiz opened this issue Dec 23, 2024 · 3 comments
Open

Consider caching the results of "related items" SOLR searches #2832

eddierubeiz opened this issue Dec 23, 2024 · 3 comments

Comments

@eddierubeiz
Copy link
Contributor

Background
Since 2022, we've been using Solr's "more-like-this" feature to fetch up to 3 works from the index that look similar, based on metadata. This is what allows us to show two other letters to Gabor Levy under "Related Works" on the work page for this letter to Gabor Levy.

Right now, every time you load that letter in a browser, our website contacts solr (at least in theory) to retrieve those other two letters, even though the likelihood of Solr changing its answer between any two consecutive calls is actually vanishingly small.

Let's consider caching the results of that call to Solr (the one that says "Tell me 3 items that are similar to this one.

Recipe:
For a given work,

  • if we have more-like-this info for that work that's current, then show that.
  • if we have no such info, or the info is stale, then ask solr for the info.
  • if solr answers, store the new info, discarding any stale info, and keep track of the current date. The info we just retrieved will go stale in (e.g.) one week.
  • if solr doesn't answer in time (happens a lot), then fall back on any stored info we might have, stale or not.

Pro:

  • This will reduce our dependence on SearchStax (the externally hosted service that provides us with our search results).
  • The page will load faster in all cases; in some cases it will load much faster (up to roughly a second faster).
  • In a vast majority of cases, we believe the results will be the same with and without the cache.
  • We hardly ever delete works, so we don't have to worry about a broken more-like-this link.

Con:
Especially in collections that are undergoing active editing, the Related Works section will fail to include newly-added or recently-edited more-like-this matches.

@jrochkind
Copy link
Contributor

We should use the standard Rails cache mechanism to do this, probably caching work primary keys as a list. (Can still fetch the works from db, not solr, on page display, by id).

https://guides.rubyonrails.org/caching_with_rails.html#low-level-caching-using-rails-cache

We should consider our choices of how to configure Rails cache. Right now we don't actually have Rails cache configured -- by default it is a per-machine in-memory cache (I think), which works if we only have ONE web dyno. If we had more than one, they might each have their own copy of a cache. And even with one, the cache will be reset every night when the dyno restarts.

That might be fine for this use case, or we could consider configuring a more persistent shared cache. We already have a memcached for rack-attack cache, perhaps we could use that for both purposes (or replace it with a redis used for both purposes, either way). We could also consider using the new solid_cache db-based cache -- and possibly for both purposes, although it might be too slow for rack-attack that happens on every single request.

@eddierubeiz
Copy link
Contributor Author

Good ideas. Another option might be to store the info in an attr_json attribute on Work. Something lightweight like:

more_like_this_cache: {
	timestamp_of_last_check: 'some_timestamp'
	similar_work_friendlier_ids: {'first_id', 'second_id', 'third_id'}
}

@jrochkind
Copy link
Contributor

i think it's best kept separate from the model, the Rails.cache is made for this sort of thing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants