Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Using primary average shard size #96177

Merged
merged 11 commits into from
Apr 13, 2021
4 changes: 2 additions & 2 deletions docs/user/monitoring/kibana-alerts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ by running checks on a schedule time of 1 minute with a re-notify interval of 6
[[kibana-alerts-large-shard-size]]
== Large shard size

This alert is triggered if a large (primary) shard size is found on any of the
specified index patterns. The trigger condition is met if an index's shard size is
This alert is triggered if a large average shard size (across associated primaries) is found on any of the
specified index patterns. The trigger condition is met if an index's (primary average) shard size is
igoristic marked this conversation as resolved.
Show resolved Hide resolved
55gb or higher in the last 5 minutes. The alert is grouped across all indices that match
the default patter of `*` by running checks on a schedule time of 1 minute with a re-notify
interval of 12 hours.
Expand Down
4 changes: 2 additions & 2 deletions x-pack/plugins/monitoring/common/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ export const ALERT_DETAILS = {
paramDetails: {
threshold: {
label: i18n.translate('xpack.monitoring.alerts.shardSize.paramDetails.threshold.label', {
defaultMessage: `Notify when a shard exceeds this size`,
defaultMessage: `Notify when primary average shard size exceeds this value`,
igoristic marked this conversation as resolved.
Show resolved Hide resolved
}),
type: AlertParamType.Number,
append: 'GB',
Expand All @@ -494,7 +494,7 @@ export const ALERT_DETAILS = {
defaultMessage: 'Shard size',
}),
description: i18n.translate('xpack.monitoring.alerts.shardSize.description', {
defaultMessage: 'Alert if an index (primary) shard is oversize.',
defaultMessage: 'Alert if an index (primary) shard average is oversize.',
igoristic marked this conversation as resolved.
Show resolved Hide resolved
}),
},
};
Expand Down
3 changes: 3 additions & 0 deletions x-pack/plugins/monitoring/common/types/es.ts
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ export interface ElasticsearchNodeStats {

export interface ElasticsearchIndexStats {
index?: string;
shards: {
primaries: number;
};
primaries?: {
docs?: {
count?: number;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ export class LargeShardSizeAlert extends BaseAlert {
description: i18n.translate(
'xpack.monitoring.alerts.shardSize.actionVariables.shardIndex',
{
defaultMessage: 'List of indices which are experiencing large shard size.',
defaultMessage:
'List of indices which are experiencing large (primary average) shard size.',
igoristic marked this conversation as resolved.
Show resolved Hide resolved
}
),
},
Expand Down Expand Up @@ -100,7 +101,7 @@ export class LargeShardSizeAlert extends BaseAlert {
const { shardIndex, shardSize } = item.meta as IndexShardSizeUIMeta;
return {
text: i18n.translate('xpack.monitoring.alerts.shardSize.ui.firingMessage', {
defaultMessage: `The following index: #start_link{shardIndex}#end_link has a large shard size of: {shardSize}GB at #absolute`,
defaultMessage: `The following index: #start_link{shardIndex}#end_link has a large (primary average) shard size of: {shardSize}GB at #absolute`,
igoristic marked this conversation as resolved.
Show resolved Hide resolved
values: {
shardIndex,
shardSize,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,6 @@ export async function fetchIndexShardSize(
},
aggs: {
over_threshold: {
filter: {
range: {
'index_stats.primaries.store.size_in_bytes': {
gt: threshold * gbMultiplier,
},
},
},
aggs: {
index: {
terms: {
Expand All @@ -96,6 +89,7 @@ export async function fetchIndexShardSize(
_source: {
includes: [
'_index',
'index_stats.shards.primaries',
'index_stats.primaries.store.size_in_bytes',
'source_node.name',
'source_node.uuid',
Expand Down Expand Up @@ -123,7 +117,7 @@ export async function fetchIndexShardSize(
if (!clusterBuckets.length) {
return stats;
}

const thresholdGB = threshold * gbMultiplier;
igoristic marked this conversation as resolved.
Show resolved Hide resolved
for (const clusterBucket of clusterBuckets) {
const indexBuckets = clusterBucket.over_threshold.index.buckets;
const clusterUuid = clusterBucket.key;
Expand All @@ -143,9 +137,21 @@ export async function fetchIndexShardSize(
_source: { source_node: sourceNode, index_stats: indexStats },
} = topHit;

const { size_in_bytes: shardSizeBytes } = indexStats?.primaries?.store!;
if (!indexStats || !indexStats.primaries) {
continue;
}

const { primaries: totalPrimaryShards } = indexStats.shards;
const { size_in_bytes: primaryShardSizeBytes = 0 } = indexStats.primaries.store!;
igoristic marked this conversation as resolved.
Show resolved Hide resolved
if (!primaryShardSizeBytes) {
continue;
}
const { name: nodeName, uuid: nodeId } = sourceNode;
const shardSize = +(shardSizeBytes! / gbMultiplier).toFixed(2);
const avgShardSize = primaryShardSizeBytes ? primaryShardSizeBytes / totalPrimaryShards : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already defaulting this value to 0 on line 145, right? 0 / totalPrimaryShards will be 0 so do we need this ternary?

Also, do we ever expect totalPrimaryShards to be 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the ternary actually checks to make sure it's not 0 (which can be either from the response or our default), since I don't want to divide the 0 (like you mentioned). Maybe scoping would make it more readable? eg:

avgShardSize = primaryShardSizeBytes ? (primaryShardSizeBytes / totalPrimaryShards) : 0

...do we ever expect totalPrimaryShards to be 0

Yes, it's possible to have zero primaries (mainly during the allocation of shards), in which case the cluster status goes red (though in most cases it's temporary since not all shards are yet assigned). This article explains it very well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't even need this check, since I'm already doing this right before:

if (!primaryShardSizeBytes) {
  continue;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that's what I was thinking. I asked about totalPrimaryShards because we're currently not checking that value to make sure it's not 0, so I think we can still end up in a divide by 0 scenario here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the condition if (primaryShardSizeBytes) would not pass because 0 is actually considered falsy in javascript/ts. So, we don't need to explicitly say if (primaryShardSizeBytes === 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yeah I know about 0 being falsy ;) I'm asking about the denominator, totalPrimaryShards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I see your point now, sorry for the derp moment 🙃

Added that to the check as well, so should be good now

const shardSize = +(avgShardSize / gbMultiplier).toFixed(2);
if (shardSize < thresholdGB) {
continue;
}
stats.push({
shardIndex,
shardSize,
Expand Down