-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report transport message size per action #94543
Report transport message size per action #94543
Conversation
Adds to the transport stats a histogram of transport message sizes for each transport action.
Hi @DaveCTurner, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments. I wonder if we should add a brief .yml test to validate the structure of the response. And also, I think we can add documentation for the new stats in this PR.
requestId, | ||
version, | ||
header.getCompressionScheme(), | ||
ResponseStatsConsumer.NONE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the response stats instead here, except if short circuited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we should track the response stats if short-circuited too? 905e5ad does that, but I can rework this some more if you'd prefer to drop the response stats for short-circuited ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, we short-circuit if the action is not known, as well as on other up-front exceptions. afaa718 captures the response stats for known actions even if short-circuited.
* n: < 2^(n+3) | ||
* ... | ||
* 25: < 2^28 == 256MiB | ||
* 26: >= 2^28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing 2 buckets? >= 512MiB and >= 1GiB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured by the time we got to 256MiB the messages were already unreasonably large, but you're right it's only 2 buckets away from the max size anyway. Added in 80eeac4.
@@ -41,7 +43,8 @@ public TransportStats( | |||
long txCount, | |||
long txSize, | |||
long[] inboundHandlingTimeBucketFrequencies, | |||
long[] outboundHandlingTimeBucketFrequencies | |||
long[] outboundHandlingTimeBucketFrequencies, | |||
List<TransportActionStats> transportActionStats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more intuitive to have the stats be a Map - also to avoid that TransportActionStats
is a fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in f2e42f1
// Stats came from before v8.7 | ||
assert Version.CURRENT.major == Version.V_8_0_0.major; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not exactly sure what the purpose here is? I am guessing it is to ensure that we cleanup something here when major switches to 9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah except it should have mentioned 7.0 not 8.0. Fixed now.
return builder.endObject(); | ||
} | ||
|
||
static void histogramToXContent(XContentBuilder builder, long[] sizeHistogram, String fieldName) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fieldName seems to be passed "histogram" always except in tests, could it not be hardcoded instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think I used the shorter one to avoid some line breaking mess but it's not necessary any more. Done in e24b97e.
final int[] bucketBounds = TransportActionStatsTracker.getBucketUpperBounds(); | ||
assert sizeHistogram.length == bucketBounds.length + 1; | ||
builder.startArray(fieldName); | ||
|
||
int firstBucket = 0; | ||
long remainingCount = 0L; | ||
for (int i = 0; i < sizeHistogram.length; i++) { | ||
if (remainingCount == 0) { | ||
firstBucket = i; | ||
} | ||
remainingCount += sizeHistogram[i]; | ||
} | ||
|
||
for (int i = firstBucket; i < sizeHistogram.length && 0 < remainingCount; i++) { | ||
builder.startObject(); | ||
if (i > 0) { | ||
builder.humanReadableField("ge_bytes", "ge", ByteSizeValue.ofBytes(bucketBounds[i - 1])); | ||
} | ||
if (i < bucketBounds.length) { | ||
builder.humanReadableField("lt_bytes", "lt", ByteSizeValue.ofBytes(bucketBounds[i])); | ||
} | ||
builder.field("count", sizeHistogram[i]); | ||
builder.endObject(); | ||
remainingCount -= sizeHistogram[i]; | ||
} | ||
builder.endArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider org.HdrHistogram.Histogram
to simplify rendering json and omitting empty buckets and also for determining the bucket by the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how this could make things simpler, but maybe I'm missing something.
.sorted(Comparator.comparing(RequestHandlerRegistry::getAction)) | ||
.map(RequestHandlerRegistry::getStats) | ||
.toList(); | ||
.collect(Collectors.toMap(RequestHandlerRegistry::getAction, RequestHandlerRegistry::getStats, (a, b) -> a, TreeMap::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re (a, b) -> a
, do we expect a duplicate action somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but we've got to say something here (there's no overload that accepts a mapFactory
without a mergeFunction
) and this is the shortest thing we can say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay. NIT: we have org.elasticsearch.common.util.Maps#toUnmodifiableSortedMap
that only accept key/value functions and could be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL thanks, that's what I wanted 😄
Hi @DaveCTurner, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
showing the total amount of traffic and a histogram of message sizes for | ||
incoming requests and outgoing responses. | ||
+ | ||
.Properties of `actions.*.requests` and `actions.*.responses` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below seems to describe the histograms, can we make that explicit here? And also include documentation of the outer level count
and total_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops good catch. Fixed in 63d5d73.
header.isHandshake(), | ||
message.takeBreakerReleaseControl() | ||
); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I see a potential for exceptions in the code above. Fine to keep this, but if there is no exception expected, perhaps we can assert false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I couldn't either but I'm paranoid about leaking a response here (or in future). Assertion added in 20ee73d.
import java.io.IOException; | ||
|
||
public record TransportActionStats( | ||
String action, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is no longer used and I think we should remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done in c1247b0.
assert Version.CURRENT.major == Version.V_8_0_0.major; | ||
assert Version.CURRENT.major == Version.V_7_0_0.major + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the difference, can you explain the motivation for changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Version.CURRENT
becomes 9.0 we remove all the v7.x version constants, so by mentioning one here we are forced to remove this dead code to fix the compile error.
Adds to the transport stats a histogram of transport message sizes for each transport action.
Closes #88151