Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataflow Streaming] Avoid calling WorkItem::getSerializedSize #33581

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

arunpandianp
Copy link
Contributor

@arunpandianp arunpandianp commented Jan 14, 2025

Prior to this change WorkItem::getSerializedSize is called on the GetWorkStream thread for every work item, it shows up like in cpu profiles. Replacing it with the byteSize computed from the response bytes makes the work submit path a bit more efficient

Screenshot 2025-01-14 at 5 29 17 PM

#33578

@arunpandianp
Copy link
Contributor Author

Run Java Precommit

@arunpandianp arunpandianp marked this pull request as ready for review January 14, 2025 07:13
@arunpandianp
Copy link
Contributor Author

R: @scwhittle

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@arunpandianp arunpandianp changed the title Avoid calling WorkItem::getSerializedSize [Dataflow Streaming] Avoid calling WorkItem::getSerializedSize Jan 14, 2025
@scwhittle
Copy link
Contributor

I was thinking that since java protos are immutable that this would be cheap. But it may not cache the size internally so seems ok to change.

Do you have any #s from testing/profiles to add to description as motivation?

@arunpandianp
Copy link
Contributor Author

Run Java Precommit

@arunpandianp
Copy link
Contributor Author

I was thinking that since java protos are immutable that this would be cheap. But it may not cache the size internally so seems ok to change.

Do you have any #s from testing/profiles to add to description as motivation?

added a profile screenshot. The results are cached internally https://stackoverflow.com/a/1903792.
But it was getting called in the Getwork loop for the first time and calculating size for all workitems in a single thread seems costly.

@arunpandianp
Copy link
Contributor Author

Run Java Unit Tests

@scwhittle scwhittle merged commit 607fd43 into apache:master Jan 16, 2025
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants