Reduce severity of log messages when hitting disk watermarks #48038

DaveCTurner · 2019-10-15T08:08:17Z

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects.

On 2019-08-28 the distributed team discussed this and decided to reduce the severity of these messages. Messages about the low watermark should not be warnings at all, and ideally should not be emitted repeatedly since we expect nodes to be over this watermark for an extended period. Messages about the high watermark could reasonably be warnings iff the cluster cannot relocate shards to bring nodes under the watermark again; if the cluster is actively relocating shards that will balance the disk usage then there is no need for a warning.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-15T08:08:19Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes elastic#48038

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes #48038

DaveCTurner added help wanted adoptme :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 15, 2019

DaveCTurner added >enhancement and removed help wanted adoptme labels Oct 15, 2019

DaveCTurner self-assigned this Oct 15, 2019

DaveCTurner mentioned this issue Oct 16, 2019

Quieter logging from the DiskThresholdMonitor #48115

Merged

DaveCTurner closed this as completed in #48115 Oct 18, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce severity of log messages when hitting disk watermarks #48038

Reduce severity of log messages when hitting disk watermarks #48038

DaveCTurner commented Oct 15, 2019 •

edited

Loading

elasticmachine commented Oct 15, 2019

Reduce severity of log messages when hitting disk watermarks #48038

Reduce severity of log messages when hitting disk watermarks #48038

Comments

DaveCTurner commented Oct 15, 2019 • edited Loading

elasticmachine commented Oct 15, 2019

DaveCTurner commented Oct 15, 2019 •

edited

Loading