Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PeerFinder log messages happier #83222

Merged

Conversation

DaveCTurner
Copy link
Contributor

Since #73128 a sufficiently old PeerFinder will report all exceptions
encountered during discovery to help diagnose cluster formation
problems. We throw exceptions on genuine connection failures, but we
also throw exceptions if the discovered node is the local node or is
master-ineligible because these nodes are no use in discovery. We report
all such exceptions as failures:

[instance-0000000001]
    address [10.0.0.1:12345], node [null], requesting [false]
    connection failed:
        [instance-0000000002][10.0.0.1:12345]
        non-master-eligible node found

Experience shows that users often have master-ineligible nodes in their
discovery config so will see these messages frequently if the cluster
cannot form, and may interpret the connection failed as the source of
the problems even though they're benign.

This commit adjusts the language in these messages to be more balanced,
replacing connection failed with discovery result, including the
phrase successfully discovered in the exception messsage, and giving
advice on how to suppress the message.

Since elastic#73128 a sufficiently old `PeerFinder` will report all exceptions
encountered during discovery to help diagnose cluster formation
problems. We throw exceptions on genuine connection failures, but we
also throw exceptions if the discovered node is the local node or is
master-ineligible because these nodes are no use in discovery. We report
all such exceptions as failures:

    [instance-0000000001]
        address [10.0.0.1:12345], node [null], requesting [false]
        connection failed:
            [instance-0000000002][10.0.0.1:12345]
            non-master-eligible node found

Experience shows that users often have master-ineligible nodes in their
discovery config so will see these messages frequently if the cluster
cannot form, and may interpret the `connection failed` as the source of
the problems even though they're benign.

This commit adjusts the language in these messages to be more balanced,
replacing `connection failed` with `discovery result`, including the
phrase `successfully discovered` in the exception messsage, and giving
advice on how to suppress the message.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.1.0 labels Jan 27, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 27, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 28, 2022
@DaveCTurner DaveCTurner merged commit f380ada into elastic:master Jan 28, 2022
@DaveCTurner DaveCTurner deleted the 2022-01-27-happier-PeerFinder-messages branch January 28, 2022 11:59
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jan 31, 2022
* upstream/master: (100 commits)
  Avoid duplicate _type fields in v7 compat layer (elastic#83239)
  Bump bundled JDK to 17.0.2+8 (elastic#83243)
  [DOCS] Correct header syntax (elastic#83275)
  Add unit tests for indices.recovery.max_bytes_per_sec default values (elastic#83261)
  [DOCS] Add note that write indices are not replicated (elastic#82997)
  Add notes on indexing to kNN search guide (elastic#83188)
  Fix get-snapshot-api :docs:integTest (elastic#83273)
  FilterPathBasedFilter support match fieldname with dot (elastic#83178)
  Fix compilation issues in example-plugins (elastic#83258)
  fix ClusterStateListener javadoc (elastic#83246)
  Speed up Building Indices Lookup in Metadata (elastic#83241)
  Mute whole suite for elastic#82502 (elastic#83252)
  Make PeerFinder log messages happier (elastic#83222)
  [Docs] Add supported _terms_enum field types (elastic#83244)
  Add an aggregator for IPv4 and IPv6 subnets (elastic#82410)
  [CI] Fix 70_time_series/default sort yaml test failures (elastic#83217)
  Update test-failure Issue Template to include "needs:triage" label elastic#83226
  Add an index->step cache to the PolicyStepsRegistry (elastic#82316)
  Improve support for joda datetime to java datetime transition in Painless (elastic#83099)
  Fix joda migration for week based methods in Painless (elastic#83232)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants