Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap check failures are easy to miss in logs #93074

Closed
pquentin opened this issue Jan 19, 2023 · 4 comments · Fixed by #93178
Closed

Bootstrap check failures are easy to miss in logs #93074

pquentin opened this issue Jan 19, 2023 · 4 comments · Fixed by #93178
Assignees
Labels
>bug :Core/Infra/Logging Log management and logging utilities Team:Core/Infra Meta label for core/infra team

Comments

@pquentin
Copy link
Member

pquentin commented Jan 19, 2023

Elasticsearch Version

8.7.0-SNAPSHOT (50c1a78)

Installed Plugins

No response

Java Version

bundled

OS Version

Linux fedora 5.15.89-200.fc37.x86_64 #1 SMP Wed Jan 18 14:59:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When bootstrap checks fail, we only receive INFO logs and it's easy to miss the problem. The console output is as follows:

[2023-01-19T13:49:33,194][INFO ][o.e.t.TransportService   ] [fedora] publish_address {192.168.1.33:9300}, bound_addresses {192.168.1.33:9300}
[2023-01-19T13:49:33,414][INFO ][o.e.b.BootstrapChecks    ] [fedora] bound or publishing to a non-loopback address, enforcing bootstrap checks
bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
ERROR: Elasticsearch did not exit normally - check the logs at /home/q/Downloads/elasticsearch-8.7.0-SNAPSHOT/logs/elasticsearch.log
[2023-01-19T13:49:33,420][INFO ][o.e.n.Node               ] [fedora] stopping ...
[2023-01-19T13:49:33,434][INFO ][o.e.n.Node               ] [fedora] stopped
[2023-01-19T13:49:33,434][INFO ][o.e.n.Node               ] [fedora] closing ...
[2023-01-19T13:49:33,441][INFO ][o.e.n.Node               ] [fedora] closed
[2023-01-19T13:49:33,443][INFO ][o.e.x.m.p.NativeController] [fedora] Native controller process has stopped - no new native processes can be started

ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.

This is surprising as there is nothing else in the output. But what is worse is that elasticsearch.log (which is what the log asks to look and which was anyway the only file I was looking at the time) ends with the following:

[2023-01-19T13:40:25,913][INFO ][o.e.t.TransportService   ] [fedora] publish_address {192.168.1.33:9300}, bound_addresses {192.168.1.33:9300}
[2023-01-19T13:40:26,003][INFO ][o.e.b.BootstrapChecks    ] [fedora] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2023-01-19T13:40:26,008][INFO ][o.e.n.Node               ] [fedora] stopping ...
[2023-01-19T13:40:26,022][INFO ][o.e.n.Node               ] [fedora] stopped
[2023-01-19T13:40:26,022][INFO ][o.e.n.Node               ] [fedora] closing ...
[2023-01-19T13:40:26,029][INFO ][o.e.n.Node               ] [fedora] closed
[2023-01-19T13:40:26,030][INFO ][o.e.x.m.p.NativeController] [fedora] Native controller process has stopped - no new native processes can be started

This is very confusing as I was looking for an ERROR log and did not understand at all why Elasticsearch decided to stop. Nothing mentions in that log if something failed.

I looked a bit at minor Elasticsearch versions (7.17, 8.0, 8.2, 8.3, 8.4, 8.6 and main). It turns out that Elasticsearch 8.2.0 does everything correctly:

  • the console output makes sense:

    [2023-01-19T13:58:51,800][INFO ][o.e.b.BootstrapChecks    ] [fedora] bound or publishing to a non-loopback address, enforcing bootstrap checks
    
    ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
    bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
    ERROR: Elasticsearch did not exit normally - check the logs at /home/q/Downloads/elasticsearch-8.2.0/logs/elasticsearch.log
    
  • the logs do include an ERROR:

    [2023-01-19T13:58:51,658][INFO ][o.e.t.TransportService   ] [fedora] publish_address {192.168.1.33:9300}, bound_addresses {192.168.1.33:9300}
    [2023-01-19T13:58:51,800][INFO ][o.e.b.BootstrapChecks    ] [fedora] bound or publishing to a non-loopback address, enforcing bootstrap checks
    [2023-01-19T13:58:51,805][ERROR][o.e.b.Bootstrap          ] [fedora] node validation exception
    [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
    bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
    

However:

  • Starting with Elasticsearch 8.3.0 the console output is wrong like the current main output but elasticsearch.log is still correct.
  • Starting with Elasticsearch 8.4.0 both the console output and elasticsearch.log are wrong like the current main output

Steps to Reproduce

  • Build or download an Elasticsearch distribution (linux-tar in my case)
  • Make two changes in config/elasticsearch.yml:
    • change network.host to your IP address in the local network (192.168.1.33 in my case) without setting discovery settings (to make a bootstrap check fail)
    • add xpack.security.enabled: false to make sure the security by default does not add cluster.initial_master_nodes that will make the bootstrap check pass
  • Start bin/elasticsearch and look at both the console output and elasticsearch.log

Logs (if relevant)

Here are the full logs on main:

[2023-01-19T13:40:17,332][INFO ][o.e.n.Node               ] [fedora] version[8.7.0-SNAPSHOT], pid[39012], build[tar/50c1a781581355b87a840f4a7f9ee97a2f0ba9d6/2023-01-19T09:37:49.307292342Z], OS[Linux/5.15.89-200.fc37.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/19.0.1/19.0.1+10-21]
[2023-01-19T13:40:17,337][INFO ][o.e.n.Node               ] [fedora] JVM home [/home/q/Downloads/elasticsearch-8.7.0-SNAPSHOT/jdk], using bundled JDK [true]
[2023-01-19T13:40:17,338][INFO ][o.e.n.Node               ] [fedora] JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -Djava.security.manager=allow, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-2031972451056469335, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,level,pid,tags:filecount=32,filesize=64m, -Xms15876m, -Xmx15876m, -XX:MaxDirectMemorySize=8325693440, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=25, -Des.distribution.type=tar, --module-path=/home/q/Downloads/elasticsearch-8.7.0-SNAPSHOT/lib, --add-modules=jdk.net, -Djdk.module.main=org.elasticsearch.server]
[2023-01-19T13:40:17,338][WARN ][o.e.n.Node               ] [fedora] version [8.7.0-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
[2023-01-19T13:40:19,006][INFO ][c.a.c.i.j.JacksonVersion ] [fedora] Package versions: jackson-core=2.13.4, jackson-databind=2.13.4-2, jackson-dataformat-xml=2.13.4, jackson-datatype-jsr310=2.13.4, azure-core=1.34.0, Troubleshooting version conflicts: https://aka.ms/azsdk/java/dependency/troubleshoot
[2023-01-19T13:40:20,221][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [aggregations]
[2023-01-19T13:40:20,221][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [analysis-common]
[2023-01-19T13:40:20,221][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [apm]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [blob-cache]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [constant-keyword]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [data-streams]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [frozen-indices]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [ingest-attachment]
[2023-01-19T13:40:20,222][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [ingest-common]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [ingest-geoip]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [ingest-user-agent]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [kibana]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [lang-expression]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [lang-mustache]
[2023-01-19T13:40:20,223][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [lang-painless]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [legacy-geo]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [mapper-extras]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [mapper-version]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [old-lucene-versions]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [parent-join]
[2023-01-19T13:40:20,224][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [percolator]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [rank-eval]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [reindex]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [repositories-metering-api]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [repository-azure]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [repository-gcs]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [repository-s3]
[2023-01-19T13:40:20,225][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [repository-url]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [runtime-fields-common]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [search-business-rules]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [searchable-snapshots]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [snapshot-based-recoveries]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [snapshot-repo-test-kit]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [spatial]
[2023-01-19T13:40:20,226][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [test-delayed-aggs]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [test-die-with-dignity]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [test-error-query]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [transform]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [transport-netty4]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [unsigned-long]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [vector-tile]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [wildcard]
[2023-01-19T13:40:20,227][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-aggregate-metric]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-analytics]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-async]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-async-search]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-autoscaling]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-ccr]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-core]
[2023-01-19T13:40:20,228][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-deprecation]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-enrich]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-eql]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-fleet]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-graph]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-identity-provider]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-ilm]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-logstash]
[2023-01-19T13:40:20,229][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-ml]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-monitoring]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-profiling]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-ql]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-rollup]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-security]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-shutdown]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-sql]
[2023-01-19T13:40:20,230][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-stack]
[2023-01-19T13:40:20,231][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-text-structure]
[2023-01-19T13:40:20,231][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-voting-only-node]
[2023-01-19T13:40:20,231][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-watcher]
[2023-01-19T13:40:20,231][INFO ][o.e.p.PluginsService     ] [fedora] loaded module [x-pack-write-load-forecaster]
[2023-01-19T13:40:20,231][INFO ][o.e.p.PluginsService     ] [fedora] no plugins loaded
[2023-01-19T13:40:22,285][INFO ][o.e.e.NodeEnvironment    ] [fedora] using [1] data paths, mounts [[/home (/dev/mapper/luks-f3108a6f-20c6-4104-b9d7-41f904584a08)]], net usable_space [614.9gb], net total_space [952.2gb], types [btrfs]
[2023-01-19T13:40:22,285][INFO ][o.e.e.NodeEnvironment    ] [fedora] heap size [15.5gb], compressed ordinary object pointers [true]
[2023-01-19T13:40:22,292][INFO ][o.e.n.Node               ] [fedora] node name [fedora], node ID [JTrfld3xREqKCR3nzBuVXg], cluster name [elasticsearch], roles [ingest, data_frozen, ml, data_hot, transform, data_content, data_warm, master, remote_cluster_client, data, data_cold]
[2023-01-19T13:40:23,948][INFO ][o.e.x.p.ProfilingPlugin  ] [fedora] Profiling is disabled
[2023-01-19T13:40:24,519][INFO ][o.e.x.s.Security         ] [fedora] Security is disabled
[2023-01-19T13:40:24,564][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [fedora] [controller/39055] [Main.cc@123] controller (64 bit): Version 8.7.0-SNAPSHOT (Build 976a2a293ec815) Copyright (c) 2023 Elasticsearch BV
[2023-01-19T13:40:24,933][INFO ][o.e.t.n.NettyAllocator   ] [fedora] creating NettyAllocator with the following configs: [name=elasticsearch_configured, chunk_size=1mb, suggested_max_allocation_size=1mb, factors={es.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=8mb}]
[2023-01-19T13:40:24,952][INFO ][o.e.i.r.RecoverySettings ] [fedora] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
[2023-01-19T13:40:24,985][INFO ][o.e.d.DiscoveryModule    ] [fedora] using discovery type [multi-node] and seed hosts providers [settings]
[2023-01-19T13:40:25,821][INFO ][o.e.n.Node               ] [fedora] initialized
[2023-01-19T13:40:25,822][INFO ][o.e.n.Node               ] [fedora] starting ...
[2023-01-19T13:40:25,834][INFO ][o.e.x.s.c.f.PersistentCache] [fedora] persistent cache index loaded
[2023-01-19T13:40:25,835][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [fedora] deprecation component started
[2023-01-19T13:40:25,913][INFO ][o.e.t.TransportService   ] [fedora] publish_address {192.168.1.33:9300}, bound_addresses {192.168.1.33:9300}
[2023-01-19T13:40:26,003][INFO ][o.e.b.BootstrapChecks    ] [fedora] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2023-01-19T13:40:26,008][INFO ][o.e.n.Node               ] [fedora] stopping ...
[2023-01-19T13:40:26,022][INFO ][o.e.n.Node               ] [fedora] stopped
[2023-01-19T13:40:26,022][INFO ][o.e.n.Node               ] [fedora] closing ...
[2023-01-19T13:40:26,029][INFO ][o.e.n.Node               ] [fedora] closed
[2023-01-19T13:40:26,030][INFO ][o.e.x.m.p.NativeController] [fedora] Native controller process has stopped - no new native processes can be started
@pquentin pquentin added >bug needs:triage Requires assignment of a team area label :Core/Infra/Logging Log management and logging utilities and removed needs:triage Requires assignment of a team area label labels Jan 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jan 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@thecoop
Copy link
Member

thecoop commented Jan 19, 2023

8.4 corresponds with the Great Startup Refactor. That's the most likely cause of this change

@thecoop thecoop self-assigned this Jan 19, 2023
@thecoop
Copy link
Member

thecoop commented Jan 19, 2023

The code that did the log file logging was explicitly removed by #87809, leaving just the system err logging in Elasticsearch.java

@rjernst can you provide some more context on whether this logging behaviour change for validation errors was deliberate?

The best thing to do is probably to define the precise behaviour we want around validation exception specifically, then re-implement it in 8.7/8.6

@rjernst
Copy link
Member

rjernst commented Jan 19, 2023

This change to behavior was an unintentional side effect of the startup refactoring. I will work on a fix.

@rjernst rjernst assigned rjernst and unassigned thecoop Jan 19, 2023
rjernst added a commit to rjernst/elasticsearch that referenced this issue Jan 23, 2023
Bootstrap check failures were unintentionally moved to the info level
in the log output. This commit restores logging them as errors.

closes elastic#93074
rjernst added a commit that referenced this issue Jan 24, 2023
Bootstrap check failures were unintentionally moved to the info level
in the log output. This commit restores logging them as errors.

closes #93074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Logging Log management and logging utilities Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants