pageserver: batch get page requests and serve them with one vectored get #9377

VladLazar · 2024-10-14T10:46:42Z

We don't take advantage of queue depth generated by the compute
on the pageserver. We can process get page requests more efficiently
by batching them.

Hold get page requests for configurable maximum debounce timeout in order
to facilitate merging. Then process the entire batch via one get_vectored timeline operation.
By default, no merging takes place.

Sub-Tasks / Punted

Give feedback

https://github.com/neondatabase/cloud/issues/20620
small Retroactive RFC, reuse write-ups in PRs, they're quite good
page_service: figure out correctness of holding the TimelineHandle in the pending batch #9850
page_service: batching needless waits for unbatchable requests #9835
page_service: unit-test batching logic #9834
page_service: measure tail latency impact in batchable workload #9837
Options

Implementation

Give feedback

## Problem We don't take advantage of queue depth generated by the compute on the pageserver. We can process getpage requests more efficiently by batching them. ## Summary of changes Batch up incoming getpage requests that arrive within a configurable time window (`server_side_batch_timeout`). Then process the entire batch via one `get_vectored` timeline operation. By default, no merging takes place. ## Testing * **Functional**: #9792 * **Performance**: will be done in staging/pre-prod # Refs * #9377 * #9376 Co-authored-by: Christian Schwarz <[email protected]>

VladLazar added a/performance Area: relates to performance of the system c/storage Component: storage c/storage/pageserver Component: storage: pageserver labels Oct 14, 2024

VladLazar mentioned this issue Oct 14, 2024

Epic: get page throughput improvements #9376

Open

VladLazar self-assigned this Oct 14, 2024

problame linked a pull request Oct 23, 2024 that will close this issue

feat(page_service): timeout-based batching of requests #9321

Merged

problame assigned problame and unassigned VladLazar Nov 17, 2024

problame linked a pull request Nov 18, 2024 that will close this issue

page_service: getpage batching: refactor & minor fixes #9792

Open

problame closed this as completed in #9321 Nov 18, 2024

problame reopened this Nov 18, 2024

problame closed this as completed Nov 18, 2024

problame linked a pull request Nov 20, 2024 that will close this issue

page_service: getpage batching: refactor & minor fixes #9792

Open

problame reopened this Nov 20, 2024

problame mentioned this issue Nov 22, 2024

page_service: rewrite batching to work without a timeout, pipeline in protocol handler instead #9851

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: batch get page requests and serve them with one vectored get #9377

pageserver: batch get page requests and serve them with one vectored get #9377

VladLazar commented Oct 14, 2024 •

edited by problame

Loading

Sub-Tasks / Punted

Implementation

pageserver: batch get page requests and serve them with one vectored get #9377

pageserver: batch get page requests and serve them with one vectored get #9377

Comments

VladLazar commented Oct 14, 2024 • edited by problame Loading

Sub-Tasks / Punted

Implementation

VladLazar commented Oct 14, 2024 •

edited by problame

Loading