Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache search results in blob service to use in requests for next page #3591

Open
punktilious opened this issue Apr 22, 2022 · 2 comments
Open
Labels
enhancement New feature or request P2 Priority 2 - Should Have

Comments

@punktilious
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
For search result pagination, currently the server will execute the same search query with different OFFSET/LIMIT arguments. This can be expensive, and may in some circumstances require the database to access/join the entire contents of the result set before filtering according to the OFFSET/LIMIT values.

Describe the solution you'd like
With offloading now supported, we can leverage something like Azure Blob to store the meta-information for each result page on the first access. Subsequent accesses need only to access the associated blob and fetch the resources (which means nothing needs to touch the RDBMS - the entire next page request can be serviced from blob storage.

Describe alternatives you've considered
Store the result meta-information in the RDBMS.

Acceptance Criteria

  1. GIVEN [a precondition]
    AND [another precondition]
    WHEN [test step]
    AND [test step]
    THEN [verification step]
    AND [verification step]

Additional context
Result sets from non-specific searches could be extremely large and we don't necessarily want to cache the entire set of matching rows. It might be sufficient to cache the first N pages. If the search result page is not present in blob storage, we simply need to rerun the RDBMS query and repopulate the cache. This would also be the case if using time-to-live (TTL) properties on the blob objects - something which would greatly simplify cleanup.

@punktilious punktilious added the enhancement New feature or request label Apr 22, 2022
@punktilious
Copy link
Collaborator Author

See also #3590 related to stable search pagination. This could also help with that.

@lmsurpre
Copy link
Member

The idea is to save just the internal resource ids associated with each page (up to some configurable limit--both size and time) in the object store. For example, re-use the transaction timeout value as the expiry for the objects we write here.

To make this work well, we should consider expanding on the existing code for payload offloading such that: each search turns into a query that returns the set of ids (up to the configurable limit). then we convert those ids into as a second step. some of our queries already do that, but not all.

@lmsurpre lmsurpre added the P2 Priority 2 - Should Have label Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2 Priority 2 - Should Have
Projects
None yet
Development

No branches or pull requests

2 participants