-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lucene segment-level fields stats #111123
Conversation
8126d91
to
6d02fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to having these stats available! I left a couple of comments.
builder.field(Fields.TOTAL_SEGMENT_FIELDS, totalSegments); | ||
builder.field(Fields.AVERAGE_FIELDS_PER_SEGMENT, totalSegments == 0 ? 0 : totalSegmentFields / totalSegments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the average per node is nice but could we also include the raw counts in the output so we can aggregate this across nodes more easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Yes, we send these raw stats between nodes and return only the average in the rest response, as including the number of segments in the mappings section doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
including the number of segments in the mappings section doesn't make sense
Although I see what you mean, I disagree that this is a good reason to omit it. The number of segments might not be available elsewhere (hence why we've added it to NodeMappingStats
) or at least there may be some variance between the number used in this computation and the one exposed elsewhere. Let's have the raw figures here as well as the average please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I pushed e42a69f
server/src/main/java/org/elasticsearch/index/mapper/NodeMappingStats.java
Outdated
Show resolved
Hide resolved
…gStats.java Co-authored-by: David Turner <[email protected]>
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall change looks ok but I'd rather see the raw values in the output as well as the computed average (see this comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Nhat. Very nice to have these stats available !
Should we also update the docs (node-stats.asciidoc
) ?
LGTM, pending David's suggestion to expose the raw values as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@DaveCTurner @andreidan Thanks for reviewing :). |
@andreidan ++ I updated it in 9eb89c9. |
The `search_shards` API is not available in serverless. This PR replaces its usage in the newly added test with the `search` API with profiling. Relates #111123
Previously, we returned the number of segments and the total number of fields in those segments in NodeMappingStats (see #111123). However, the total number of fields returned in that PR might be very inaccurate for indices having large mappings but only a small number of actual fields. This change returns a more accurate total number of fields using the Lucene FieldInfos from those segments. Since we need to acquire a searcher to compute this stats, we opt to compute it after a shard is refreshed and cache the result. Relates #111123
Previously, we returned the number of segments and the total number of fields in those segments in NodeMappingStats (see elastic#111123). However, the total number of fields returned in that PR might be very inaccurate for indices having large mappings but only a small number of actual fields. This change returns a more accurate total number of fields using the Lucene FieldInfos from those segments. Since we need to acquire a searcher to compute this stats, we opt to compute it after a shard is refreshed and cache the result. Relates elastic#111123
This change returns the total number of fields at the segment level, allowing for a more accurate estimate of the memory used by Lucene. The new estimate is expected to be closer to the actual memory usage than the current estimate using the index-level field count, due to the non-trivial overhead incurred by each Lucene segment. Two new fields are introduced: total_segment_fields, which represents the total number of fields at the segment level, and average_fields_per_segment. The overhead per field in segments with fewer fields is larger than in segments with many fields.