-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Compact Serialization of Metadata #82608
More Compact Serialization of Metadata #82608
Conversation
Serialize the map of hashes to mappings and then lookup from the map instead of serializing them over and over for each index to make full cluster state transport messages much smaller in the common case of many duplicate mappings.
Pinging @elastic/es-distributed (Team:Distributed) |
if (in.getVersion().onOrAfter(MAPPINGS_AS_HASH_VERSION)) { | ||
final int mappings = in.readVInt(); | ||
if (mappings > 0) { | ||
final Map<String, MappingMetadata> mappingMetadataMap = new HashMap<>(mappings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HashMap
constructors accepts the capacity, not the expected amount of elements. It needs to be sized a bit higher than mappings
, otherwise it will need to be resized/rehashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, though I guess it might be worthwhile to have a general fix to this. We seem to always pre-size capacity == element count in deserialization. Technically, we probably could move to accounting for the load factor, but I wouldn't expect too much from it (especially when the key's hashcode is essentially free).
Thanks Ievgen! |
Serialize the map of hashes to mappings and then lookup from the map instead
of serializing them over and over for each index to make full cluster state
transport messages much smaller in the common case of many duplicate mappings.
This should make the master node impact of requests for the full cluster state (or at least the state including mappings) quite a bit cheaper memory+cpu+network wise. Also it saves lots of buffers on the coordinating/sending node as well as CPU for deduplicating mappings.
relates #77466