Cache search results in blob service to use in requests for next page #3591

punktilious · 2022-04-22T13:15:14Z

Is your feature request related to a problem? Please describe.
For search result pagination, currently the server will execute the same search query with different OFFSET/LIMIT arguments. This can be expensive, and may in some circumstances require the database to access/join the entire contents of the result set before filtering according to the OFFSET/LIMIT values.

Describe the solution you'd like
With offloading now supported, we can leverage something like Azure Blob to store the meta-information for each result page on the first access. Subsequent accesses need only to access the associated blob and fetch the resources (which means nothing needs to touch the RDBMS - the entire next page request can be serviced from blob storage.

Describe alternatives you've considered
Store the result meta-information in the RDBMS.

Acceptance Criteria

GIVEN [a precondition]
AND [another precondition]
WHEN [test step]
AND [test step]
THEN [verification step]
AND [verification step]

Additional context
Result sets from non-specific searches could be extremely large and we don't necessarily want to cache the entire set of matching rows. It might be sufficient to cache the first N pages. If the search result page is not present in blob storage, we simply need to rerun the RDBMS query and repopulate the cache. This would also be the case if using time-to-live (TTL) properties on the blob objects - something which would greatly simplify cleanup.

The text was updated successfully, but these errors were encountered:

punktilious · 2022-04-22T13:25:22Z

See also #3590 related to stable search pagination. This could also help with that.

lmsurpre · 2022-09-20T12:49:17Z

The idea is to save just the internal resource ids associated with each page (up to some configurable limit--both size and time) in the object store. For example, re-use the transaction timeout value as the expiry for the objects we write here.

To make this work well, we should consider expanding on the existing code for payload offloading such that: each search turns into a query that returns the set of ids (up to the configurable limit). then we convert those ids into as a second step. some of our queries already do that, but not all.

punktilious added the enhancement New feature or request label Apr 22, 2022

lmsurpre added the P2 Priority 2 - Should Have label Sep 20, 2022

lmsurpre mentioned this issue Sep 20, 2022

Ensure sort is stable #3906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache search results in blob service to use in requests for next page #3591

Cache search results in blob service to use in requests for next page #3591

punktilious commented Apr 22, 2022

punktilious commented Apr 22, 2022

lmsurpre commented Sep 20, 2022

Cache search results in blob service to use in requests for next page #3591

Cache search results in blob service to use in requests for next page #3591

Comments

punktilious commented Apr 22, 2022

punktilious commented Apr 22, 2022

lmsurpre commented Sep 20, 2022