Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page_service: batching observability & include throttled time in smgr metrics #9870

Merged
merged 105 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
0689965
WIP: page_service: add basic testcase for merging
problame Nov 18, 2024
15e21c7
got it working and turn it more into a benchmark
problame Nov 18, 2024
61ff84a
compiles
problame Nov 19, 2024
911946a
fixes
problame Nov 19, 2024
5cc0059
parametrize more test
problame Nov 20, 2024
b974616
switch back to tokio::time::sleep, to get the numbers
problame Nov 20, 2024
f2de5b5
make it a proper benchmark
problame Nov 20, 2024
e80ce97
collect CPU utilization
problame Nov 20, 2024
75041cb
bench fixups
problame Nov 20, 2024
b695907
page_service: add benchmark for batching
problame Nov 18, 2024
aa695b2
Revert "switch back to tokio::time::sleep, to get the numbers"
problame Nov 20, 2024
88d52b3
Merge branch 'problame/batching-benchmark' into problame/merge-getpag…
problame Nov 20, 2024
b299eb1
fixup whitespace stuff
problame Nov 20, 2024
af95320
Revert "Revert "switch back to tokio::time::sleep, to get the numbers""
problame Nov 20, 2024
1639b26
async-timer based approach
problame Nov 20, 2024
f3ed569
Revert "async-timer based approach"
problame Nov 20, 2024
81d9970
tokio::time::Interval based approach
problame Nov 20, 2024
1d85bec
Revert "tokio::time::Interval based approach"
problame Nov 20, 2024
12124b2
tokio_timerfd::Interval
problame Nov 20, 2024
f9bf038
Revert "tokio_timerfd::Interval"
problame Nov 20, 2024
689788c
async-timer based approach (again, with data)
problame Nov 20, 2024
7be13bc
undo local modifications to benchmark
problame Nov 20, 2024
c73e9e4
try async-timer 1.0.0-beta15 (still signal-based timers)
problame Nov 20, 2024
68550f0
async-timer 1.0.0-beta15 with features=tokio1
problame Nov 20, 2024
721643b
try interval-based impl to cross-chec
problame Nov 20, 2024
5f3e6f3
Revert "try interval-based impl to cross-chec"
problame Nov 20, 2024
cbb5817
Revert "async-timer 1.0.0-beta15 with features=tokio1"
problame Nov 20, 2024
21866fa
Revert "try async-timer 1.0.0-beta15 (still signal-based timers)"
problame Nov 20, 2024
469ce81
Revert "async-timer based approach (again, with data)"
problame Nov 20, 2024
fcda7a7
tokio_timerfd::Delay based impl
problame Nov 20, 2024
f22ad86
Revert "tokio_timerfd::Delay based impl"
problame Nov 20, 2024
517dda8
vanilla tokio based timer impl based on tokio::time::Sleep
problame Nov 20, 2024
c68661d
Revert "undo local modifications to benchmark"
problame Nov 20, 2024
89b6cb8
Revert "vanilla tokio based timer impl based on tokio::time::Sleep"
problame Nov 20, 2024
fa7ce2c
the final choice: async-timer 1.0beta15 with features=["tokio1"]
problame Nov 21, 2024
e82deb2
high-resolution CPU usage
problame Nov 21, 2024
3375f28
pytest.approx; https://github.com/neondatabase/neon/pull/9820#discuss…
problame Nov 21, 2024
ff0aa15
Merge remote-tracking branch 'origin/main' into problame/batching-ben…
problame Nov 21, 2024
058b35f
Merge branch 'problame/batching-benchmark' into problame/merge-getpag…
problame Nov 21, 2024
09e7485
Merge branch 'problame/merge-getpage-test' into problame/batching-timer
problame Nov 21, 2024
a1bb2e7
WIP: pipelined batching
problame Nov 21, 2024
aa1032a
no need for cancel & ctx in pagestream_do_batch
problame Nov 21, 2024
345f8b6
fix ready_for_next_batch order
problame Nov 21, 2024
408bc8f
cleanups
problame Nov 21, 2024
73046fd
span fixes
problame Nov 21, 2024
56de071
fruitless debugging
problame Nov 21, 2024
7680aa1
draft
problame Nov 21, 2024
240e48d
improvements
problame Nov 21, 2024
db9093f
revert back to 'span fixes' commit
problame Nov 21, 2024
88fd8ae
watch-based approach
problame Nov 21, 2024
89d9d16
cherry-pick from problame/batching-benchmark while it's waiting for m…
problame Nov 22, 2024
a3d1cf6
config changes to express pipelining config (not respected yet)
problame Nov 22, 2024
c1040bc
task-based mode
problame Nov 22, 2024
0fa8ae3
WIP refactor to allow truly serial mode
problame Nov 22, 2024
093674b
impmlement the serial mode
problame Nov 22, 2024
c1e8347
make configurable whether pipelining should use concurrent futures or…
problame Nov 22, 2024
39e45f9
improve tests
problame Nov 22, 2024
ef502f8
remove async-timer heritage
problame Nov 22, 2024
a28c54d
cosmetics
problame Nov 22, 2024
d6e5a46
eliminate the word `batch` and stale doc comments
problame Nov 22, 2024
11dc713
rename test file to test_page_service_batching
problame Nov 22, 2024
5796f3b
fix test
problame Nov 22, 2024
cbe1839
Benchmark results (metal node, AMD Ryzen 9 7950X3D 16-Core Processor)
problame Nov 22, 2024
990e44d
longer target runtime
problame Nov 22, 2024
bd31f42
run benchmarks
problame Nov 22, 2024
6ec5ac1
DO NOT MERGE: enable pipelining (32,concurrent-futures) by default so…
problame Nov 25, 2024
c4f92a2
WIP: batching observability improvements
problame Nov 25, 2024
0bb0372
logging to debug test_pageserver_restarts_under_worload
problame Nov 25, 2024
b9477aa
fix: batcher wouldn't shut down after executor exits
problame Nov 25, 2024
99b664c
expand fix to tasks mode; add some comments
problame Nov 25, 2024
9bf2618
implement spsc_fold
problame Nov 26, 2024
a23abb2
adopt spsc_fold
problame Nov 26, 2024
41ddc67
benchmark
problame Nov 26, 2024
18ffaba
fix pipeline cancellation
problame Nov 26, 2024
e0123c8
explain the pipeline cancellation story
problame Nov 27, 2024
7fb3d95
review & identified a cast that isn't handled, document that
problame Nov 27, 2024
82e1fa3
WIP
problame Nov 27, 2024
07358de
converge on approach that pushes read Result through pipeline
problame Nov 28, 2024
6bd39f9
rn benchmark on hetzner runner
problame Nov 28, 2024
a2a3613
reintroduce task-based execution
problame Nov 28, 2024
f44bfcc
benchmark on hetzner runner
problame Nov 28, 2024
9a5611a
merge reader&batcher stages, update docs
problame Nov 29, 2024
27c72e4
benchmark on hetzner box
problame Nov 29, 2024
dfcbb13
the `None` configuration in the benchmark would use the default instead
problame Nov 29, 2024
90ef03c
benchmarks on hetzner box
problame Nov 29, 2024
bab3dd0
Merge remote-tracking branch 'origin/main' into problame/batching-sid…
problame Nov 29, 2024
199a4bd
Merge remote-tracking branch 'origin/main' into problame/batching-sid…
problame Nov 29, 2024
0d28084
merge brought back test_pageserver_getpage_merge.py
problame Nov 29, 2024
53e18b2
less repetitive match arms; https://github.com/neondatabase/neon/pull…
problame Nov 29, 2024
9b65b26
stop Box'ing stuff & clean up the passing-through of errors (remove e…
problame Nov 29, 2024
2cab051
fix escaping of lfc path (exposed by the benchmark)
problame Nov 29, 2024
6d36c07
Merge remote-tracking branch 'origin/problame/batching-sidecar-task' …
problame Nov 29, 2024
7a39ad4
correct start & end times for smgr query observations (fixes #9925) a…
problame Nov 29, 2024
92b3619
abbreviated benchmark run on hetzner box
problame Nov 29, 2024
e10e247
log pipelining config during startup
problame Nov 29, 2024
9338a3a
remove ex_throttled everywhere else as well, so it's the same behavio…
problame Nov 29, 2024
9a90eaa
adjust throttling test accordingly
problame Nov 29, 2024
2432c73
global and per-timeline metric for batching
problame Nov 29, 2024
e8c5d9c
python test: rely solely on the new metrics
problame Nov 29, 2024
42daa4f
benchmark results on my hetzner box
problame Nov 29, 2024
69b878f
Merge remote-tracking branch 'origin/main' into problame/batching-met…
problame Nov 30, 2024
e6c14f6
fix test
problame Dec 2, 2024
04c358a
obsolete comment; https://github.com/neondatabase/neon/pull/9870#disc…
problame Dec 2, 2024
be2d64b
restore the behavior that get_vectored and scan metrics are ex thrott…
problame Dec 2, 2024
5d778ee
comment on timer lifetime; https://github.com/neondatabase/neon/pull/…
problame Dec 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pageserver/src/bin/pageserver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ fn main() -> anyhow::Result<()> {
info!(?conf.virtual_file_io_engine, "starting with virtual_file IO engine");
info!(?conf.virtual_file_io_mode, "starting with virtual_file IO mode");
info!(?conf.wal_receiver_protocol, "starting with WAL receiver protocol");
info!(?conf.page_service_pipelining, "starting with page service pipelining config");

// The tenants directory contains all the pageserver local disk state.
// Create if not exists and make sure all the contents are durable before proceeding.
Expand Down Expand Up @@ -302,7 +303,7 @@ fn start_pageserver(
pageserver::metrics::tokio_epoll_uring::Collector::new(),
))
.unwrap();
pageserver::preinitialize_metrics();
pageserver::preinitialize_metrics(conf);

// If any failpoints were set from FAILPOINTS environment variable,
// print them to the log for debugging purposes
Expand Down
5 changes: 0 additions & 5 deletions pageserver/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,16 +91,13 @@

use crate::task_mgr::TaskKind;

pub(crate) mod optional_counter;

// The main structure of this module, see module-level comment.
#[derive(Debug)]
pub struct RequestContext {
task_kind: TaskKind,
download_behavior: DownloadBehavior,
access_stats_behavior: AccessStatsBehavior,
page_content_kind: PageContentKind,
pub micros_spent_throttled: optional_counter::MicroSecondsCounterU32,
}

/// The kind of access to the page cache.
Expand Down Expand Up @@ -158,7 +155,6 @@ impl RequestContextBuilder {
download_behavior: DownloadBehavior::Download,
access_stats_behavior: AccessStatsBehavior::Update,
page_content_kind: PageContentKind::Unknown,
micros_spent_throttled: Default::default(),
},
}
}
Expand All @@ -172,7 +168,6 @@ impl RequestContextBuilder {
download_behavior: original.download_behavior,
access_stats_behavior: original.access_stats_behavior,
page_content_kind: original.page_content_kind,
micros_spent_throttled: Default::default(),
},
}
}
Expand Down
101 changes: 0 additions & 101 deletions pageserver/src/context/optional_counter.rs

This file was deleted.

Loading
Loading