Deduplicate fields in FieldCaps intra-cluster messages #100022
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have a stable equality check for individual fields capabilities. Even in the dynamic mapping case where mappings vary across indices, we tend to have a lot of overlap across mappings. By dedplicating the individual field capabilities instances when serializing we can reduce the size of the field caps messages massively in many cases.
A repeated field takes up only an extra ~4 bytes in the transport message which compared to tens of MB in the general case. Even if deduplication were to never apply this change should reduce response sizes though as it only adds about ~4 bytes in overhead for a never deduplicated field but saves the redundant double writing of field names we used to do (we wrote them both in the map key and in the value) and it seems safe to assume that almost all field names are longer than 4 bytes.
We can get even more of a speedup here in another follow-up by moving the deduplication logic to the
FieldCapabilitiesFetcher
so that we don't have to keep around endless duplicate field caps instances until serialisation time.Follow-up to #100010