Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce severity of log messages when hitting disk watermarks #48038

Closed
DaveCTurner opened this issue Oct 15, 2019 · 1 comment · Fixed by #48115
Closed

Reduce severity of log messages when hitting disk watermarks #48038

DaveCTurner opened this issue Oct 15, 2019 · 1 comment · Fixed by #48115
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Oct 15, 2019

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects.

On 2019-08-28 the distributed team discussed this and decided to reduce the severity of these messages. Messages about the low watermark should not be warnings at all, and ideally should not be emitted repeatedly since we expect nodes to be over this watermark for an extended period. Messages about the high watermark could reasonably be warnings iff the cluster cannot relocate shards to bring nodes under the watermark again; if the cluster is actively relocating shards that will balance the disk usage then there is no need for a warning.

@DaveCTurner DaveCTurner added help wanted adoptme :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@DaveCTurner DaveCTurner added >enhancement and removed help wanted adoptme labels Oct 15, 2019
@DaveCTurner DaveCTurner self-assigned this Oct 15, 2019
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 16, 2019
Today if an Elasticsearch node reaches a disk watermark then it will repeatedly
emit logging about it, which implies that some action needs to be taken by the
administrator. This is misleading. Elasticsearch strives to keep nodes under
the high watermark, but it is normal to have a few nodes occasionally exceed
this level. Nodes may be over the low watermark for an extended period without
any ill effects.

This commit enhances the logging emitted by the `DiskThresholdMonitor` to be
less misleading. The expected case of hitting the high watermark and
immediately relocating one or more shards that to bring the node back under the
watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages
are not emitted repeatedly.

Fixes elastic#48038
DaveCTurner added a commit that referenced this issue Oct 18, 2019
Today if an Elasticsearch node reaches a disk watermark then it will repeatedly
emit logging about it, which implies that some action needs to be taken by the
administrator. This is misleading. Elasticsearch strives to keep nodes under
the high watermark, but it is normal to have a few nodes occasionally exceed
this level. Nodes may be over the low watermark for an extended period without
any ill effects.

This commit enhances the logging emitted by the `DiskThresholdMonitor` to be
less misleading. The expected case of hitting the high watermark and
immediately relocating one or more shards that to bring the node back under the
watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages
are not emitted repeatedly.

Fixes #48038
DaveCTurner added a commit that referenced this issue Oct 18, 2019
Today if an Elasticsearch node reaches a disk watermark then it will repeatedly
emit logging about it, which implies that some action needs to be taken by the
administrator. This is misleading. Elasticsearch strives to keep nodes under
the high watermark, but it is normal to have a few nodes occasionally exceed
this level. Nodes may be over the low watermark for an extended period without
any ill effects.

This commit enhances the logging emitted by the `DiskThresholdMonitor` to be
less misleading. The expected case of hitting the high watermark and
immediately relocating one or more shards that to bring the node back under the
watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages
are not emitted repeatedly.

Fixes #48038
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants