Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate fields in FieldCaps intra-cluster messages #100022

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Sep 28, 2023

We have a stable equality check for individual fields capabilities. Even in the dynamic mapping case where mappings vary across indices, we tend to have a lot of overlap across mappings. By dedplicating the individual field capabilities instances when serializing we can reduce the size of the field caps messages massively in many cases.

A repeated field takes up only an extra ~4 bytes in the transport message which compared to tens of MB in the general case. Even if deduplication were to never apply this change should reduce response sizes though as it only adds about ~4 bytes in overhead for a never deduplicated field but saves the redundant double writing of field names we used to do (we wrote them both in the map key and in the value) and it seems safe to assume that almost all field names are longer than 4 bytes.

We can get even more of a speedup here in another follow-up by moving the deduplication logic to the FieldCapabilitiesFetcher so that we don't have to keep around endless duplicate field caps instances until serialisation time.

Follow-up to #100010

We have a stable equality check for individual fields capabilities.
Even in the dynamic mapping case where mappings vary across indices, we
tend to have a lot of overlap across mappings. By dedplicating the
individual field capabilities instances when serializing we can reduce
the size of the field caps messages massively in many cases.

A repeated field takes up only an extra ~4 bytes in the transport
message which compared to tens of MB in the general case.
Even if deduplication were to never apply this change should reduce
response sizes though as it only adds about ~4 bytes in overhead for a
never deduplicated field but saves the redundant double writing of field
names we used to do (we wrote them both in the map key and in the value)
and it seems safe to assume that almost all field names are longer than
4 bytes.
@original-brownbear original-brownbear added >non-issue :Search/Search Search-related issues that do not fall into other categories labels Sep 28, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team v8.11.0 labels Sep 28, 2023
@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/bwc

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Alan!

@original-brownbear original-brownbear merged commit 517a542 into elastic:main Sep 28, 2023
@original-brownbear original-brownbear deleted the dedup-fc-across-fields branch September 28, 2023 15:55
piergm pushed a commit to piergm/elasticsearch that referenced this pull request Oct 2, 2023
We have a stable equality check for individual fields capabilities.
Even in the dynamic mapping case where mappings vary across indices, we
tend to have a lot of overlap across mappings. By dedplicating the
individual field capabilities instances when serializing we can reduce
the size of the field caps messages massively in many cases.

A repeated field takes up only an extra ~4 bytes in the transport
message which compared to tens of MB in the general case.
Even if deduplication were to never apply this change should reduce
response sizes though as it only adds about ~4 bytes in overhead for a
never deduplicated field but saves the redundant double writing of field
names we used to do (we wrote them both in the map key and in the value)
and it seems safe to assume that almost all field names are longer than
4 bytes.
@original-brownbear original-brownbear restored the dedup-fc-across-fields branch November 30, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants