Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wazuh indexer is failed in the AMI deployment #307

Closed
davidcr01 opened this issue Jul 12, 2024 · 11 comments · Fixed by #310
Closed

[BUG] Wazuh indexer is failed in the AMI deployment #307

davidcr01 opened this issue Jul 12, 2024 · 11 comments · Fixed by #310
Assignees
Labels
level/task Task issue type/bug Bug issue

Comments

@davidcr01
Copy link

Describe the bug

In https://github.com/wazuh/internal-devel-requests/issues/1262, we have detected that the Wazuh dashboard can not load because the Wazuh indexer service is failed after the AMI is built:

-- Logs begin at vie 2024-07-12 14:20:09 UTC, end at vie 2024-07-12 14:41:35 UTC. --
jul 12 14:20:47 wazuh-server systemd[1]: Starting wazuh-indexer...
-- Subject: Unit wazuh-indexer.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit wazuh-indexer.service has begun starting up.
jul 12 14:21:12 wazuh-server systemd-entrypoint[2270]: WARNING: A terminally deprecated method in java.lang.System has been called
jul 12 14:21:12 wazuh-server systemd-entrypoint[2270]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
jul 12 14:21:12 wazuh-server systemd-entrypoint[2270]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
jul 12 14:21:12 wazuh-server systemd-entrypoint[2270]: WARNING: System::setSecurityManager will be removed in a future release
jul 12 14:21:15 wazuh-server systemd-entrypoint[2270]: Jul 12, 2024 2:21:15 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
jul 12 14:21:15 wazuh-server systemd-entrypoint[2270]: WARNING: COMPAT locale provider will be removed in a future release
jul 12 14:21:19 wazuh-server systemd-entrypoint[2270]: WARNING: A terminally deprecated method in java.lang.System has been called
jul 12 14:21:19 wazuh-server systemd-entrypoint[2270]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
jul 12 14:21:19 wazuh-server systemd-entrypoint[2270]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
jul 12 14:21:19 wazuh-server systemd-entrypoint[2270]: WARNING: System::setSecurityManager will be removed in a future release
jul 12 14:22:03 wazuh-server systemd[1]: wazuh-indexer.service start operation timed out. Terminating.
jul 12 14:22:03 wazuh-server systemd[1]: Failed to start wazuh-indexer.
-- Subject: Unit wazuh-indexer.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit wazuh-indexer.service has failed.
-- 
-- The result is failed.

To Reproduce
Steps to reproduce the behavior:

  1. Access to the provided instance (credentials shared with private message)
  2. Check the Wazuh indexer status (should be failed)

Expected behavior
The Wazuh indexer should be running.

Host/Environment (please complete the following information):

  • OS: AL2
  • Version Wazuh 4.9.0

Additional context
If the indexer service is restarted, everything works as expected.

@davidcr01 davidcr01 added level/task Task issue type/bug Bug issue labels Jul 12, 2024
@davidcr01
Copy link
Author

Access of the VM was shared to @AlexRuiz7 and @f-galland.

@wazuhci wazuhci moved this to Backlog in Release 4.9.0 Jul 12, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.9.0 Jul 12, 2024
@f-galland
Copy link
Member

I logged in to the machine with the provided credentials and was able to asses that indeed, the service is not starting up because of a timeout.

[root@wazuh-server ~]# systemctl status wazuh-indexer
● wazuh-indexer.service - wazuh-indexer
   Loaded: loaded (/usr/lib/systemd/system/wazuh-indexer.service; enabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Fri 2024-07-12 15:16:59 UTC; 3h 17min ago
     Docs: https://documentation.wazuh.com
  Process: 2404 ExecStart=/usr/share/wazuh-indexer/bin/systemd-entrypoint -p ${PID_DIR}/wazuh-indexer.pid --quiet (code=exited, status=143)
 Main PID: 2404 (code=exited, status=143)

Jul 12 15:16:12 wazuh-server systemd-entrypoint[2404]: Jul 12, 2024 3:16:11 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
Jul 12 15:16:12 wazuh-server systemd-entrypoint[2404]: WARNING: COMPAT locale provider will be removed in a future release
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/wazuh-in....13.0.jar)
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: System::setSecurityManager will be removed in a future release
Jul 12 15:16:59 wazuh-server systemd[1]: wazuh-indexer.service start operation timed out. Terminating.
Jul 12 15:16:59 wazuh-server systemd[1]: Failed to start wazuh-indexer.
Jul 12 15:16:59 wazuh-server systemd[1]: Unit wazuh-indexer.service entered failed state.
Jul 12 15:16:59 wazuh-server systemd[1]: wazuh-indexer.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

However I would need to review the AMI build and deploy logs to determine the cause for this issue, since issuing a:

systemctl restart wazuh-indexer

within the instance solves the problem.

@wazuhci wazuhci moved this from In progress to On hold in Release 4.9.0 Jul 12, 2024
@davidcr01
Copy link
Author

I have observed the following errors in the wazuh-cluster.log file.

[2024-07-15T00:00:00,323][WARN ][o.o.p.c.u.JsonConverter  ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])
[2024-07-15T00:00:00,341][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,349][INFO ][o.o.p.PluginsService     ] [node-1] PluginService:onIndexModule index:[wazuh-statistics-2024.29w/u_MtaqwGSy-I4a8QIEn0Ew]
[2024-07-15T00:00:00,355][INFO ][o.o.c.m.MetadataMappingService] [node-1] [wazuh-statistics-2024.29w/u_MtaqwGSy-I4a8QIEn0Ew] update_mapping [_doc]
[2024-07-15T00:00:00,375][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,376][INFO ][o.o.c.r.a.AllocationService] [node-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[wazuh-monitoring-2024.29w][0]]]).
[2024-07-15T00:00:00,390][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,394][INFO ][o.o.c.m.MetadataUpdateSettingsService] [node-1] updating number_of_replicas to [0] for indices [wazuh-monitoring-2024.29w]
[2024-07-15T00:00:05,323][WARN ][o.o.p.c.u.JsonConverter  ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])
[2024-07-15T00:00:10,323][WARN ][o.o.p.c.u.JsonConverter  ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])

@f-galland
Copy link
Member

@davidcr01

That log is dated today, so I'm not sure it's connected to the issue.

The full wazuh-cluster.log from the first start attempt follows:

wazuh-cluster.log
[2024-07-12T15:16:15,508][INFO ][o.o.n.Node               ] [node-1] version[2.13.0], pid[2404], build[rpm/f327a4177e4bfee79ce9096dd18a6769d6cf77bc/2024-06-20T18:27:36.506219Z], OS[Linux/4.14.348-265.565.amzn2.x86_64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/21.0.2/21.0.2+13-LTS]
[2024-07-12T15:16:15,510][INFO ][o.o.n.Node               ] [node-1] JVM home [/usr/share/wazuh-indexer/jdk], using bundled JDK/JRE [true]
[2024-07-12T15:16:15,510][INFO ][o.o.n.Node               ] [node-1] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.security.manager=allow, -Djava.locale.providers=SPI,COMPAT, -Xms3930m, -Xmx3930m, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/var/log/wazuh-indexer/tmp, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/wazuh-indexer, -XX:ErrorFile=/var/log/wazuh-indexer/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/wazuh-indexer/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.security.manager=allow, -Djava.util.concurrent.ForkJoinPool.common.threadFactory=org.opensearch.secure_sm.SecuredForkJoinWorkerThreadFactory, -Dclk.tck=100, -Djdk.attach.allowAttachSelf=true, -Djava.security.policy=file:///etc/wazuh-indexer/opensearch-performance-analyzer/opensearch_security.policy, --add-opens=jdk.attach/sun.tools.attach=ALL-UNNAMED, -XX:MaxDirectMemorySize=2060451840, -Dopensearch.path.home=/usr/share/wazuh-indexer, -Dopensearch.path.conf=/etc/wazuh-indexer, -Dopensearch.distribution.type=rpm, -Dopensearch.bundled_jdk=true]
[2024-07-12T15:16:27,306][INFO ][o.o.s.s.t.SSLConfig      ] [node-1] SSL dual mode is disabled
[2024-07-12T15:16:27,306][INFO ][o.o.s.OpenSearchSecurityPlugin] [node-1] OpenSearch Config path is /etc/wazuh-indexer
[2024-07-12T15:16:28,786][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] JVM supports TLSv1.3
[2024-07-12T15:16:28,789][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] Config directory is /etc/wazuh-indexer/, from there the key- and truststore files are resolved relatively
[2024-07-12T15:16:30,147][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] TLS Transport Client Provider : JDK
[2024-07-12T15:16:30,148][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] TLS Transport Server Provider : JDK
[2024-07-12T15:16:30,149][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] TLS HTTP Provider             : JDK
[2024-07-12T15:16:30,149][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] Enabled TLS protocols for transport layer : [TLSv1.3, TLSv1.2]
[2024-07-12T15:16:30,149][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node-1] Enabled TLS protocols for HTTP layer      : [TLSv1.2]
[2024-07-12T15:16:30,541][INFO ][o.o.s.OpenSearchSecurityPlugin] [node-1] Clustername: wazuh-cluster
[2024-07-12T15:16:46,229][INFO ][o.o.p.c.c.PluginSettings ] [node-1] Trying to create directory /dev/shm/performanceanalyzer/.
[2024-07-12T15:16:46,230][INFO ][o.o.p.c.c.PluginSettings ] [node-1] Config: metricsLocation: /dev/shm/performanceanalyzer/, metricsDeletionInterval: 1, httpsEnabled: false, cleanup-metrics-db-files: true, batch-metrics-retention-period-minutes: 7, rpc-port: 9650, webservice-port 9600
[2024-07-12T15:16:52,028][ERROR][o.o.p.c.j.GCMetrics      ] [node-1] MX bean missing: G1 Concurrent GC
[2024-07-12T15:16:56,605][INFO ][o.o.i.r.ReindexPlugin    ] [node-1] ReindexPlugin reloadSPI called
[2024-07-12T15:16:56,606][INFO ][o.o.i.r.ReindexPlugin    ] [node-1] Unable to find any implementation for RemoteReindexExtension
[2024-07-12T15:16:57,175][INFO ][o.o.j.JobSchedulerPlugin ] [node-1] Loaded scheduler extension: opendistro_anomaly_detector, index: .opendistro-anomaly-detector-jobs
[2024-07-12T15:16:57,582][INFO ][o.o.j.JobSchedulerPlugin ] [node-1] Loaded scheduler extension: reports-scheduler, index: .opendistro-reports-definitions
[2024-07-12T15:16:57,583][INFO ][o.o.j.JobSchedulerPlugin ] [node-1] Loaded scheduler extension: opendistro-index-management, index: .opendistro-ism-config
[2024-07-12T15:16:57,584][INFO ][o.o.j.JobSchedulerPlugin ] [node-1] Loaded scheduler extension: scheduler_geospatial_ip2geo_datasource, index: .scheduler-geospatial-ip2geo-datasource
[2024-07-12T15:16:57,586][INFO ][o.o.j.JobSchedulerPlugin ] [node-1] Loaded scheduler extension: opensearch_sap_job, index: .opensearch-sap--job
[2024-07-12T15:16:57,723][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [aggs-matrix-stats]
[2024-07-12T15:16:57,724][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [analysis-common]
[2024-07-12T15:16:57,724][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [cache-common]
[2024-07-12T15:16:57,724][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [geo]
[2024-07-12T15:16:57,724][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [ingest-common]
[2024-07-12T15:16:57,725][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [ingest-geoip]
[2024-07-12T15:16:57,725][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [ingest-user-agent]
[2024-07-12T15:16:57,725][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [lang-expression]
[2024-07-12T15:16:57,725][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [lang-mustache]
[2024-07-12T15:16:57,726][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [lang-painless]
[2024-07-12T15:16:57,726][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [mapper-extras]
[2024-07-12T15:16:57,726][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [opensearch-dashboards]
[2024-07-12T15:16:57,726][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [parent-join]
[2024-07-12T15:16:57,727][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [percolator]
[2024-07-12T15:16:57,727][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [rank-eval]
[2024-07-12T15:16:57,727][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [reindex]
[2024-07-12T15:16:57,727][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [repository-url]
[2024-07-12T15:16:57,727][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [search-pipeline-common]
[2024-07-12T15:16:57,728][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [systemd]
[2024-07-12T15:16:57,728][INFO ][o.o.p.PluginsService     ] [node-1] loaded module [transport-netty4]
[2024-07-12T15:16:57,728][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-alerting]
[2024-07-12T15:16:57,729][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-anomaly-detection]
[2024-07-12T15:16:57,729][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-asynchronous-search]
[2024-07-12T15:16:57,729][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-cross-cluster-replication]
[2024-07-12T15:16:57,729][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-geospatial]
[2024-07-12T15:16:57,729][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-index-management]
[2024-07-12T15:16:57,730][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-job-scheduler]
[2024-07-12T15:16:57,730][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-knn]
[2024-07-12T15:16:57,730][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-ml]
[2024-07-12T15:16:57,730][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-neural-search]
[2024-07-12T15:16:57,730][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-notifications]
[2024-07-12T15:16:57,731][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-notifications-core]
[2024-07-12T15:16:57,731][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-observability]
[2024-07-12T15:16:57,731][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-performance-analyzer]
[2024-07-12T15:16:57,731][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-reports-scheduler]
[2024-07-12T15:16:57,731][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-security]
[2024-07-12T15:16:57,732][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-security-analytics]
[2024-07-12T15:16:57,732][INFO ][o.o.p.PluginsService     ] [node-1] loaded plugin [opensearch-sql]
[2024-07-12T15:16:57,966][INFO ][o.o.s.OpenSearchSecurityPlugin] [node-1] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in opensearch.yml
[2024-07-12T15:16:58,077][INFO ][o.o.e.ExtensionsManager  ] [node-1] ExtensionsManager initialized
[2024-07-12T15:16:58,102][INFO ][o.a.l.s.MemorySegmentIndexInputProvider] [node-1] Using MemorySegmentIndexInput with Java 21 or later; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
[2024-07-12T15:16:58,110][INFO ][o.o.e.NodeEnvironment    ] [node-1] using [1] data paths, mounts [[/ (/dev/nvme0n1p1)]], net usable_space [92gb], net total_space [99.9gb], types [xfs]
[2024-07-12T15:16:58,111][INFO ][o.o.e.NodeEnvironment    ] [node-1] heap size [3.8gb], compressed ordinary object pointers [true]
[2024-07-12T15:16:58,375][INFO ][o.o.n.Node               ] [node-1] node name [node-1], node ID [EY65EkDXTa2tOvJAsqanKw], cluster name [wazuh-cluster], roles [ingest, remote_cluster_client, data, cluster_manager]

We can notice an actual ERROR level log there.

[2024-07-12T15:16:52,028][ERROR][o.o.p.c.j.GCMetrics      ] [node-1] MX bean missing: G1 Concurrent GC

which I'm currently investigating.

@wazuhci wazuhci moved this from On hold to In progress in Release 4.9.0 Jul 15, 2024
@f-galland
Copy link
Member

We reviewed the AMI build process with @davidcr01 and determined this could be caused by the JVM's memory allocation pool as controlled by the Xmx and Xms variables within jvm.options.
A script is currently setting these values based on the total RAM amount, but under some circumstances, this may be too high (elastic recommends using half the available memory).

#!/bin/sh

# Configure JVM options for Wazuh indexer
ram_mb=$(free -m | awk '/^Mem:/{print $2}')
ram="$(( ram_mb / 2 ))"

if [ "${ram}" -eq "0" ]; then
    ram=1024;
fi

regex="^\-Xmx\K[0-9]+"
file="/etc/wazuh-indexer/jvm.options"
value=$(grep -oP ${regex} ${file})

if [[ "${value}" != "${ram}" ]]; then
    eval "sed -i "s/^-Xms.*$/-Xms${ram}m/" ${file} ${debug}"
    eval "sed -i "s/^-Xmx.*$/-Xmx${ram}m/" ${file} ${debug}"
fi

systemctl stop updateIndexerHeap.service

@wazuhci wazuhci moved this from In progress to On hold in Release 4.9.0 Jul 15, 2024
@f-galland
Copy link
Member

Hard coding the Xmx and Xms values to 1024m got rid of the G1 GC error in the logs, but the indexer is still exiting after a timeout.

@wazuhci wazuhci moved this from On hold to In progress in Release 4.9.0 Jul 15, 2024
@f-galland
Copy link
Member

f-galland commented Jul 15, 2024

It looks like wazuh-indexer is getting started before the network interfaces are set up:

[root@wazuh-server wazuh-indexer]# journalctl --since 13:43:04 --until 13:44:21 | grep systemd
Jul 15 13:43:04 wazuh-server systemd[1]: Started Postfix Mail Transport Agent.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Initial cloud-init job (metadata service crawler).
Jul 15 13:43:04 wazuh-server systemd[1]: Reached target Cloud-config availability.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Permit User Sessions...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting OpenSSH Server Key Generation...
Jul 15 13:43:04 wazuh-server systemd[1]: Reached target Network is Online.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Wazuh manager...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Notify NFS peers of a restart...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Apply the settings specified in cloud-config...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting wazuh-indexer...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Finds and configures elastic network interfaces...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting System Logging Service...
Jul 15 13:43:04 wazuh-server systemd[1]: Started Filebeat sends log files to Logstash or directly to Elasticsearch..
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Dynamically Generate Message Of The Day...
Jul 15 13:43:04 wazuh-server systemd[1]: Started amazon-ssm-agent.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Permit User Sessions.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Notify NFS peers of a restart.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Finds and configures elastic network interfaces.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Terminate Plymouth Boot Screen...
Jul 15 13:43:04 wazuh-server systemd[1]: Started Job spooling tools.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Jul 15 13:43:05 wazuh-server systemd[1]: Started Command Scheduler.
Jul 15 13:43:05 wazuh-server systemd[1]: Received SIGRTMIN+21 from PID 1623 (plymouthd).
Jul 15 13:43:05 wazuh-server systemd[1]: Started Dynamically Generate Message Of The Day.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Terminate Plymouth Boot Screen.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Getty on tty1.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Serial Getty on ttyS0.
Jul 15 13:43:05 wazuh-server systemd[1]: Reached target Login Prompts.
Jul 15 13:43:05 wazuh-server systemd[1]: Started System Logging Service.
Jul 15 13:43:05 wazuh-server systemd[1]: Started OpenSSH Server Key Generation.
Jul 15 13:43:05 wazuh-server systemd[1]: Starting OpenSSH server daemon...
Jul 15 13:43:05 wazuh-server systemd[1]: Started OpenSSH server daemon.
Jul 15 13:43:05 wazuh-server systemd[1]: Created slice User Slice of root.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Session 1 of user root.
Jul 15 13:43:07 wazuh-server systemd[1]: Time has been changed
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager will be removed in a future release
Jul 15 13:43:30 wazuh-server systemd-entrypoint[2324]: Jul 15, 2024 1:43:30 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
Jul 15 13:43:30 wazuh-server systemd-entrypoint[2324]: WARNING: COMPAT locale provider will be removed in a future release
Jul 15 13:43:31 wazuh-server systemd[1]: Started Apply the settings specified in cloud-config.
Jul 15 13:43:31 wazuh-server systemd[1]: Starting Initial hibernation setup job...
Jul 15 13:43:31 wazuh-server systemd[1]: Starting Execute cloud user/final scripts...
Jul 15 13:43:31 wazuh-server systemd[1]: Started Initial hibernation setup job.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Session c1 of user root.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Session c2 of user root.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopping OpenSSH server daemon...
Jul 15 13:43:32 wazuh-server systemd[1]: Stopped OpenSSH server daemon.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopped OpenSSH Server Key Generation.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopping OpenSSH Server Key Generation...
Jul 15 13:43:32 wazuh-server systemd[1]: Starting OpenSSH server daemon...
Jul 15 13:43:32 wazuh-server systemd[1]: Started OpenSSH server daemon.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Execute cloud user/final scripts.
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager will be removed in a future release
Jul 15 13:43:44 wazuh-server systemd-logind[1850]: New session 2 of user wazuh-user.
Jul 15 13:43:44 wazuh-server systemd[1]: Created slice User Slice of wazuh-user.
Jul 15 13:43:44 wazuh-server systemd[1]: Started Session 2 of user wazuh-user.
Jul 15 13:44:20 wazuh-server systemd[1]: Started Wazuh manager.
Jul 15 13:44:20 wazuh-server systemd[1]: wazuh-indexer.service start operation timed out. Terminating.

@f-galland
Copy link
Member

Closer inspection on /var/log/messages indicates that the interfaces are indeed ready by the time the indexer starts:

Jul 15 13:43:02 wazuh-server cloud-init: Cloud-init v. 19.3-46.amzn2.0.2 running 'init' at Mon, 15 Jul 2024 13:43:02 +0000. Up 31.59 seconds.
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |  eth0  | True |        172.31.36.215        | 255.255.240.0 | global | 0e:e7:8f:f7:d3:35 |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |  eth0  | True | fe80::ce7:8fff:fef7:d335/64 |       .       |  link  | 0e:e7:8f:f7:d3:35 |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   lo   | True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   lo   | True |           ::1/128           |       .       |  host  |         .         |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: | Route |   Destination   |   Gateway   |     Genmask     | Interface | Flags |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   0   |     0.0.0.0     | 172.31.32.1 |     0.0.0.0     |    eth0   |   UG  |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   1   | 169.254.169.254 |   0.0.0.0   | 255.255.255.255 |    eth0   |   UH  |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   2   |   172.31.32.0   |   0.0.0.0   |  255.255.240.0  |    eth0   |   U   |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-------------+---------+-----------+-------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: | Route | Destination | Gateway | Interface | Flags |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-------------+---------+-----------+-------+
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   9   |  fe80::/64  |    ::   |    eth0   |   U   |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   11  |    local    |    ::   |    eth0   |   U   |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: |   12  |   ff00::/8  |    ::   |    eth0   |   U   |
Jul 15 13:43:02 wazuh-server cloud-init: ci-info: +-------+-------------+---------+-----------+-------+
Jul 15 13:43:03 wazuh-server dhclient[2124]: XMT: Solicit on eth0, interval 2130ms.
Jul 15 13:43:04 wazuh-server systemd: Started Postfix Mail Transport Agent.
Jul 15 13:43:04 wazuh-server systemd: Started Initial cloud-init job (metadata service crawler).
Jul 15 13:43:04 wazuh-server systemd: Reached target Cloud-config availability.
Jul 15 13:43:04 wazuh-server systemd: Starting Permit User Sessions...
Jul 15 13:43:04 wazuh-server systemd: Starting OpenSSH Server Key Generation...
Jul 15 13:43:04 wazuh-server systemd: Reached target Network is Online.
Jul 15 13:43:04 wazuh-server systemd: Starting Wazuh manager...
Jul 15 13:43:04 wazuh-server systemd: Starting Notify NFS peers of a restart...
Jul 15 13:43:04 wazuh-server systemd: Starting Apply the settings specified in cloud-config...
Jul 15 13:43:04 wazuh-server systemd: Starting wazuh-indexer...

@f-galland
Copy link
Member

I also tried starving the system's RAM to check if that could trigger a timeout, but it exits with a code instead:

docker run --rm colinianking/stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 0.7;}' < /proc/meminfo)k --vm-keep -m 1

[root@wazuh-server ~]# systemctl start wazuh-indexer
Job for wazuh-indexer.service failed because the control process exited with error code. See "systemctl status wazuh-indexer.service" and "journalctl -xe" for details.

[root@wazuh-server ~]# systemctl status wazuh-indexer
● wazuh-indexer.service - wazuh-indexer
   Loaded: loaded (/usr/lib/systemd/system/wazuh-indexer.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2024-07-15 17:57:56 UTC; 14s ago
     Docs: https://documentation.wazuh.com
  Process: 5698 ExecStart=/usr/share/wazuh-indexer/bin/systemd-entrypoint -p ${PID_DIR}/wazuh-indexer.pid --quiet (code=exited, status=1/FAILURE)
 Main PID: 5698 (code=exited, status=1/FAILURE)

Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: output:
Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: #
Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: # There is insufficient memory for the Java Runtime Environment to continue.
Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: # Native memory allocation (mmap) failed to map 4120903680 bytes for committing reserved memory.
Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: # An error report file with more information is saved as:
Jul 15 17:57:56 wazuh-server systemd-entrypoint[5698]: # /var/log/wazuh-indexer/hs_err_pid5847.log
Jul 15 17:57:56 wazuh-server systemd[1]: wazuh-indexer.service: main process exited, code=exited, status=1/FAILURE
Jul 15 17:57:56 wazuh-server systemd[1]: Failed to start wazuh-indexer.
Jul 15 17:57:56 wazuh-server systemd[1]: Unit wazuh-indexer.service entered failed state.
Jul 15 17:57:56 wazuh-server systemd[1]: wazuh-indexer.service failed.

@f-galland
Copy link
Member

I brought this subject up with the support team and they mention this is a common issue when the disk is under stress and that a common fix is to increase the systemd service timeout.
We might want to try that approach.

@wazuhci wazuhci moved this from In progress to On hold in Release 4.9.0 Jul 15, 2024
@f-galland
Copy link
Member

The old packages used to use a timeout of 180 seconds, while the newer ones generated from the wazuh-indexer repo, are currently using 75.

We need to bump this up and test the AMI deploy again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/bug Bug issue
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants