adapter: record `result_size` in bytes in the activity log #30547

bosconi · 2024-11-18T20:51:20Z

We should record result_size in bytes along with rows_returned in the activity log.

Motivation

This PR adds a known-desirable feature.

https://github.com/MaterializeInc/database-issues/issues/8064

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

teskje · 2024-11-26T09:39:56Z

Just checking: Are we able to handle changes like this through persist schema migrations nowadays, or some other mechanism? Or are we going to drop all existing statement log data on upgrade? And if it's the former, could we write an upgrade test?

jkosh44 · 2024-11-26T17:45:17Z

Just checking: Are we able to handle changes like this through persist schema migrations nowadays, or some other mechanism? Or are we going to drop all existing statement log data on upgrade?

For now we are still drop all existing data on an upgrade. It's possible that the pieces are in place to use persist schema migrations, but that hasn't been hooked up to the builtin items.

jkosh44 · 2024-11-26T21:12:26Z

src/adapter/src/coord/peek.rs

+                    let result_size = rows
+                        .clone()
+                        .into_row_iter()
+                        .map(|r| u64::cast_from(r.byte_len()))
+                        .sum();


I'm not super familiar with SortedRowCollectionIter, is cloning and iterating through it expensive? If so, it feels a little bad to do that just to get the result size.

This is for FastPath peeks, so I don't anticipate the number of rows here being that large. But! Thanks for pushing back because this made me realize we already iterate through all of the rows and get the total response size to check against for the max_query_result_size session var.

Just pushed up a new commit that re-uses the calculation we already take for that.

jkosh44 · 2024-11-26T21:17:59Z

src/expr/src/row/collection.rs

@@ -34,7 +34,7 @@ pub struct RowCollection {
    /// Contiguous blob of encoded Rows.
    encoded: Bytes,
    /// Metadata about an individual Row in the blob.
-    metadata: Vec<EncodedRowMetadata>,
+    metadata: Arc<[EncodedRowMetadata]>,


What's going on here, why'd we change this?

This makes it cheaper to clone a RowCollection. We don't really need the mutability of a Vec so an Arc<[_]> works just as well, and cloning becomes just a ref count increment

jkosh44

LGTM!

* make RowIterator clone-able so we can use it to calculate total result size * Use Arc<[_]> in a few places so it's cheaper to clone * bin/fmt

…ling the number of bytes sent

…ady calculate

In the switch from CatalogItemId to GlobalId I accidentally broke the legacy builtin item update path, and this is the first time since then we've needed it. For old storage collections we switched to passing CatalogItemIds and then eventually looking up their GlobalIds from the Catalog, but these old items never get inserted into the Catalog so the lookup fails. This commit changes the legacy builtin item migrations to track old collections via GlobalIds. FWIW the original design for CatalogItemIds called out the GlobalIds somewhat become CollectionIds, and here we're tracking storage_collections_to_drop, so the switch back to GlobalId in the codepath makes sense IMO.

bosconi self-assigned this Nov 18, 2024

ParkMyCar force-pushed the result-size-activity-log branch from 7973460 to 5a1e3ea Compare November 25, 2024 17:54

bosconi marked this pull request as ready for review November 26, 2024 01:09

bosconi requested review from a team as code owners November 26, 2024 01:09

bosconi requested review from ParkMyCar and jkosh44 November 26, 2024 01:09

jkosh44 reviewed Nov 26, 2024

View reviewed changes

jkosh44 approved these changes Nov 27, 2024

View reviewed changes

bosconi and others added 11 commits November 27, 2024 14:23

pattern match on rows_returned

13d0c6e

all but two issues in SendingRowsImmediate

3b008cd

fix last few bits of reporting result_size

65c3fee

* make RowIterator clone-able so we can use it to calculate total result size * Use Arc<[_]> in a few places so it's cheaper to clone * bin/fmt

remove trailing newline

6a84280

fix returning the wrong tag in pgwire protocol, remove HACK from tota…

ea88774

…ling the number of bytes sent

fix distinct-arrangements testdrive

4d61182

update statement-logging.td

4ea1336

refactor RowSetFinishing to return the response size in bytes we alre…

ca44112

…ady calculate

fix builtin migration tests after the previous commit

58497f3

fix, remove leftover printlns from the last commit

0c04312

ParkMyCar force-pushed the result-size-activity-log branch from 7d68b6f to 0c04312 Compare November 27, 2024 19:31

ParkMyCar merged commit 4619e12 into MaterializeInc:main Nov 27, 2024
83 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: record `result_size` in bytes in the activity log #30547

adapter: record `result_size` in bytes in the activity log #30547

bosconi commented Nov 18, 2024

teskje commented Nov 26, 2024

jkosh44 commented Nov 26, 2024

jkosh44 Nov 26, 2024

ParkMyCar Nov 27, 2024

jkosh44 Nov 26, 2024

ParkMyCar Nov 27, 2024

jkosh44 left a comment

adapter: record result_size in bytes in the activity log #30547

adapter: record result_size in bytes in the activity log #30547

Conversation

bosconi commented Nov 18, 2024

Motivation

Tips for reviewer

Checklist

teskje commented Nov 26, 2024

jkosh44 commented Nov 26, 2024

jkosh44 Nov 26, 2024

Choose a reason for hiding this comment

ParkMyCar Nov 27, 2024

Choose a reason for hiding this comment

jkosh44 Nov 26, 2024

Choose a reason for hiding this comment

ParkMyCar Nov 27, 2024

Choose a reason for hiding this comment

jkosh44 left a comment

Choose a reason for hiding this comment

adapter: record `result_size` in bytes in the activity log #30547

adapter: record `result_size` in bytes in the activity log #30547