Deduplicate fields in FieldCaps intra-cluster messages #100022

original-brownbear · 2023-09-28T14:20:08Z

We have a stable equality check for individual fields capabilities. Even in the dynamic mapping case where mappings vary across indices, we tend to have a lot of overlap across mappings. By dedplicating the individual field capabilities instances when serializing we can reduce the size of the field caps messages massively in many cases.

A repeated field takes up only an extra ~4 bytes in the transport message which compared to tens of MB in the general case. Even if deduplication were to never apply this change should reduce response sizes though as it only adds about ~4 bytes in overhead for a never deduplicated field but saves the redundant double writing of field names we used to do (we wrote them both in the map key and in the value) and it seems safe to assume that almost all field names are longer than 4 bytes.

We can get even more of a speedup here in another follow-up by moving the deduplication logic to the FieldCapabilitiesFetcher so that we don't have to keep around endless duplicate field caps instances until serialisation time.

Follow-up to #100010

We have a stable equality check for individual fields capabilities. Even in the dynamic mapping case where mappings vary across indices, we tend to have a lot of overlap across mappings. By dedplicating the individual field capabilities instances when serializing we can reduce the size of the field caps messages massively in many cases. A repeated field takes up only an extra ~4 bytes in the transport message which compared to tens of MB in the general case. Even if deduplication were to never apply this change should reduce response sizes though as it only adds about ~4 bytes in overhead for a never deduplicated field but saves the redundant double writing of field names we used to do (we wrote them both in the map key and in the value) and it seems safe to assume that almost all field names are longer than 4 bytes.

elasticsearchmachine · 2023-09-28T14:20:32Z

Pinging @elastic/es-search (Team:Search)

original-brownbear · 2023-09-28T15:00:33Z

Jenkins run elasticsearch-ci/bwc

romseygeek

LGTM

original-brownbear · 2023-09-28T15:54:55Z

Thanks Alan!

We have a stable equality check for individual fields capabilities. Even in the dynamic mapping case where mappings vary across indices, we tend to have a lot of overlap across mappings. By dedplicating the individual field capabilities instances when serializing we can reduce the size of the field caps messages massively in many cases. A repeated field takes up only an extra ~4 bytes in the transport message which compared to tens of MB in the general case. Even if deduplication were to never apply this change should reduce response sizes though as it only adds about ~4 bytes in overhead for a never deduplicated field but saves the redundant double writing of field names we used to do (we wrote them both in the map key and in the value) and it seems safe to assume that almost all field names are longer than 4 bytes.

original-brownbear added >non-issue :Search/Search Search-related issues that do not fall into other categories labels Sep 28, 2023

elasticsearchmachine added Team:Search Meta label for search team v8.11.0 labels Sep 28, 2023

romseygeek approved these changes Sep 28, 2023

View reviewed changes

original-brownbear merged commit 517a542 into elastic:main Sep 28, 2023

original-brownbear deleted the dedup-fc-across-fields branch September 28, 2023 15:55

original-brownbear restored the dedup-fc-across-fields branch November 30, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate fields in FieldCaps intra-cluster messages #100022

Deduplicate fields in FieldCaps intra-cluster messages #100022

original-brownbear commented Sep 28, 2023 •

edited

Loading

elasticsearchmachine commented Sep 28, 2023

original-brownbear commented Sep 28, 2023

romseygeek left a comment

original-brownbear commented Sep 28, 2023

Deduplicate fields in FieldCaps intra-cluster messages #100022

Deduplicate fields in FieldCaps intra-cluster messages #100022

Conversation

original-brownbear commented Sep 28, 2023 • edited Loading

elasticsearchmachine commented Sep 28, 2023

original-brownbear commented Sep 28, 2023

romseygeek left a comment

Choose a reason for hiding this comment

original-brownbear commented Sep 28, 2023

original-brownbear commented Sep 28, 2023 •

edited

Loading