-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: record result_size
in bytes in the activity log
#30547
adapter: record result_size
in bytes in the activity log
#30547
Conversation
7973460
to
5a1e3ea
Compare
Just checking: Are we able to handle changes like this through persist schema migrations nowadays, or some other mechanism? Or are we going to drop all existing statement log data on upgrade? And if it's the former, could we write an upgrade test? |
For now we are still drop all existing data on an upgrade. It's possible that the pieces are in place to use persist schema migrations, but that hasn't been hooked up to the builtin items. |
src/adapter/src/coord/peek.rs
Outdated
let result_size = rows | ||
.clone() | ||
.into_row_iter() | ||
.map(|r| u64::cast_from(r.byte_len())) | ||
.sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with SortedRowCollectionIter
, is cloning and iterating through it expensive? If so, it feels a little bad to do that just to get the result size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for FastPath peeks, so I don't anticipate the number of rows here being that large. But! Thanks for pushing back because this made me realize we already iterate through all of the rows and get the total response size to check against for the max_query_result_size
session var.
Just pushed up a new commit that re-uses the calculation we already take for that.
@@ -34,7 +34,7 @@ pub struct RowCollection { | |||
/// Contiguous blob of encoded Rows. | |||
encoded: Bytes, | |||
/// Metadata about an individual Row in the blob. | |||
metadata: Vec<EncodedRowMetadata>, | |||
metadata: Arc<[EncodedRowMetadata]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here, why'd we change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it cheaper to clone a RowCollection
. We don't really need the mutability of a Vec
so an Arc<[_]>
works just as well, and cloning becomes just a ref count increment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* make RowIterator clone-able so we can use it to calculate total result size * Use Arc<[_]> in a few places so it's cheaper to clone * bin/fmt
…ling the number of bytes sent
In the switch from CatalogItemId to GlobalId I accidentally broke the legacy builtin item update path, and this is the first time since then we've needed it. For old storage collections we switched to passing CatalogItemIds and then eventually looking up their GlobalIds from the Catalog, but these old items never get inserted into the Catalog so the lookup fails. This commit changes the legacy builtin item migrations to track old collections via GlobalIds. FWIW the original design for CatalogItemIds called out the GlobalIds somewhat become CollectionIds, and here we're tracking storage_collections_to_drop, so the switch back to GlobalId in the codepath makes sense IMO.
7d68b6f
to
0c04312
Compare
We should record
result_size
in bytes along withrows_returned
in the activity log.Motivation
https://github.com/MaterializeInc/database-issues/issues/8064
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.