Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up toXContent Collection Serialization in some Spots #78742

Conversation

original-brownbear
Copy link
Member

Found this when benchmarking large cluster states. When serializing collections we'd mostly
not take any advantage of what we know about the collection contents (like we do in StreamOutput).
This PR adds a couple of helpers to the x-content-builder similar to what we have on StreamOutput
to allow for faster serializing by avoiding the writer lookup and some self-reference checks and uses them
in obvious spots I could quickly identify in profiling or via the IDE.

relates #77466

Found this when benchmarking large cluster states. When serializing collections we'd mostly
not take any advantage of what we know about the collection contents (like we do in `StreamOutput`).
This PR adds a couple of helpers to the x-content-builder similar to what we have on `StreamOutput`
to allow for faster serializing by avoiding the writer lookup and some self-reference checks.
@original-brownbear original-brownbear added :Core/Infra/Core Core issues without another label :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.16.0 labels Oct 6, 2021
@elasticmachine elasticmachine added Team:Core/Infra Meta label for core/infra team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Oct 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

How much does this save us out of interest?

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


public XContentBuilder xContentList(String name, ToXContent... values) throws IOException {
startArray(name);
for (ToXContent value : values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check for non null array here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh right TIL, I I always figured I'd get [null] here if passed a null but turns out you actually get the null :)

@@ -921,6 +923,42 @@ private XContentBuilder value(ToXContent value, ToXContent.Params params) throws
// Maps & Iterable
//////////////////////////////////

public XContentBuilder stringListField(String name, Collection<String> values) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe just array(String name, Collection<String> values) ? Same remark for the other new methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do that unfortunately because we have:

    public XContentBuilder field(String name, Iterable<?> values) throws IOException {
        return field(name).value(values);
    }

which collides with it. Same with the other cases, these super generic ? or Object type methods collide with everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I did not see this one. Let's keep like it is then.

@original-brownbear
Copy link
Member Author

Thanks David & Tanguy!

@DaveCTurner

How much does this save us out of interest?

It's not entirely trivial to isolate the effect because this changes the profile quite a bit and we seemingly also see inlining in more spots now, but I'd say we're at roughly O(5%) savings + I have a follow-up based on this in the pipeline that should have a bigger impact :)

@original-brownbear original-brownbear merged commit 3164885 into elastic:master Oct 6, 2021
@original-brownbear original-brownbear deleted the faster-serialize-collections branch October 6, 2021 12:15
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 6, 2021
…8742)

Found this when benchmarking large cluster states. When serializing collections we'd mostly
not take any advantage of what we know about the collection contents (like we do in `StreamOutput`).
This PR adds a couple of helpers to the x-content-builder similar to what we have on `StreamOutput`
to allow for faster serializing by avoiding the writer lookup and some self-reference checks.
original-brownbear added a commit that referenced this pull request Oct 6, 2021
…78755)

Found this when benchmarking large cluster states. When serializing collections we'd mostly
not take any advantage of what we know about the collection contents (like we do in `StreamOutput`).
This PR adds a couple of helpers to the x-content-builder similar to what we have on `StreamOutput`
to allow for faster serializing by avoiding the writer lookup and some self-reference checks.
@dliappis
Copy link
Contributor

dliappis commented Oct 7, 2021

@original-brownbear is this related to #26907 as well?

@original-brownbear
Copy link
Member Author

@dliappis yea to some degree for sure. A good chunk of the speedup here is from simply not having to do the self-reference check now in the changed scenarios.

@original-brownbear original-brownbear restored the faster-serialize-collections branch April 18, 2023 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue Team:Core/Infra Meta label for core/infra team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants