-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Wazuh indexer is failed in the AMI deployment #307
Comments
Access of the VM was shared to @AlexRuiz7 and @f-galland. |
I logged in to the machine with the provided credentials and was able to asses that indeed, the service is not starting up because of a timeout. [root@wazuh-server ~]# systemctl status wazuh-indexer
● wazuh-indexer.service - wazuh-indexer
Loaded: loaded (/usr/lib/systemd/system/wazuh-indexer.service; enabled; vendor preset: disabled)
Active: failed (Result: timeout) since Fri 2024-07-12 15:16:59 UTC; 3h 17min ago
Docs: https://documentation.wazuh.com
Process: 2404 ExecStart=/usr/share/wazuh-indexer/bin/systemd-entrypoint -p ${PID_DIR}/wazuh-indexer.pid --quiet (code=exited, status=143)
Main PID: 2404 (code=exited, status=143)
Jul 12 15:16:12 wazuh-server systemd-entrypoint[2404]: Jul 12, 2024 3:16:11 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
Jul 12 15:16:12 wazuh-server systemd-entrypoint[2404]: WARNING: COMPAT locale provider will be removed in a future release
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/wazuh-in....13.0.jar)
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Jul 12 15:16:15 wazuh-server systemd-entrypoint[2404]: WARNING: System::setSecurityManager will be removed in a future release
Jul 12 15:16:59 wazuh-server systemd[1]: wazuh-indexer.service start operation timed out. Terminating.
Jul 12 15:16:59 wazuh-server systemd[1]: Failed to start wazuh-indexer.
Jul 12 15:16:59 wazuh-server systemd[1]: Unit wazuh-indexer.service entered failed state.
Jul 12 15:16:59 wazuh-server systemd[1]: wazuh-indexer.service failed.
Hint: Some lines were ellipsized, use -l to show in full. However I would need to review the AMI build and deploy logs to determine the cause for this issue, since issuing a: systemctl restart wazuh-indexer within the instance solves the problem. |
I have observed the following errors in the [2024-07-15T00:00:00,323][WARN ][o.o.p.c.u.JsonConverter ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])
[2024-07-15T00:00:00,341][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,349][INFO ][o.o.p.PluginsService ] [node-1] PluginService:onIndexModule index:[wazuh-statistics-2024.29w/u_MtaqwGSy-I4a8QIEn0Ew]
[2024-07-15T00:00:00,355][INFO ][o.o.c.m.MetadataMappingService] [node-1] [wazuh-statistics-2024.29w/u_MtaqwGSy-I4a8QIEn0Ew] update_mapping [_doc]
[2024-07-15T00:00:00,375][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,376][INFO ][o.o.c.r.a.AllocationService] [node-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[wazuh-monitoring-2024.29w][0]]]).
[2024-07-15T00:00:00,390][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[2024-07-15T00:00:00,394][INFO ][o.o.c.m.MetadataUpdateSettingsService] [node-1] updating number_of_replicas to [0] for indices [wazuh-monitoring-2024.29w]
[2024-07-15T00:00:05,323][WARN ][o.o.p.c.u.JsonConverter ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])
[2024-07-15T00:00:10,323][WARN ][o.o.p.c.u.JsonConverter ] [node-1] Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"]) |
That log is dated today, so I'm not sure it's connected to the issue. The full wazuh-cluster.log
We can notice an actual
which I'm currently investigating. |
We reviewed the AMI build process with @davidcr01 and determined this could be caused by the JVM's memory allocation pool as controlled by the
|
Hard coding the |
It looks like [root@wazuh-server wazuh-indexer]# journalctl --since 13:43:04 --until 13:44:21 | grep systemd
Jul 15 13:43:04 wazuh-server systemd[1]: Started Postfix Mail Transport Agent.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Initial cloud-init job (metadata service crawler).
Jul 15 13:43:04 wazuh-server systemd[1]: Reached target Cloud-config availability.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Permit User Sessions...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting OpenSSH Server Key Generation...
Jul 15 13:43:04 wazuh-server systemd[1]: Reached target Network is Online.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Wazuh manager...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Notify NFS peers of a restart...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Apply the settings specified in cloud-config...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting wazuh-indexer...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Finds and configures elastic network interfaces...
Jul 15 13:43:04 wazuh-server systemd[1]: Starting System Logging Service...
Jul 15 13:43:04 wazuh-server systemd[1]: Started Filebeat sends log files to Logstash or directly to Elasticsearch..
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Dynamically Generate Message Of The Day...
Jul 15 13:43:04 wazuh-server systemd[1]: Started amazon-ssm-agent.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Permit User Sessions.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Notify NFS peers of a restart.
Jul 15 13:43:04 wazuh-server systemd[1]: Started Finds and configures elastic network interfaces.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Terminate Plymouth Boot Screen...
Jul 15 13:43:04 wazuh-server systemd[1]: Started Job spooling tools.
Jul 15 13:43:04 wazuh-server systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Jul 15 13:43:05 wazuh-server systemd[1]: Started Command Scheduler.
Jul 15 13:43:05 wazuh-server systemd[1]: Received SIGRTMIN+21 from PID 1623 (plymouthd).
Jul 15 13:43:05 wazuh-server systemd[1]: Started Dynamically Generate Message Of The Day.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Terminate Plymouth Boot Screen.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Getty on tty1.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Serial Getty on ttyS0.
Jul 15 13:43:05 wazuh-server systemd[1]: Reached target Login Prompts.
Jul 15 13:43:05 wazuh-server systemd[1]: Started System Logging Service.
Jul 15 13:43:05 wazuh-server systemd[1]: Started OpenSSH Server Key Generation.
Jul 15 13:43:05 wazuh-server systemd[1]: Starting OpenSSH server daemon...
Jul 15 13:43:05 wazuh-server systemd[1]: Started OpenSSH server daemon.
Jul 15 13:43:05 wazuh-server systemd[1]: Created slice User Slice of root.
Jul 15 13:43:05 wazuh-server systemd[1]: Started Session 1 of user root.
Jul 15 13:43:07 wazuh-server systemd[1]: Time has been changed
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
Jul 15 13:43:27 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager will be removed in a future release
Jul 15 13:43:30 wazuh-server systemd-entrypoint[2324]: Jul 15, 2024 1:43:30 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
Jul 15 13:43:30 wazuh-server systemd-entrypoint[2324]: WARNING: COMPAT locale provider will be removed in a future release
Jul 15 13:43:31 wazuh-server systemd[1]: Started Apply the settings specified in cloud-config.
Jul 15 13:43:31 wazuh-server systemd[1]: Starting Initial hibernation setup job...
Jul 15 13:43:31 wazuh-server systemd[1]: Starting Execute cloud user/final scripts...
Jul 15 13:43:31 wazuh-server systemd[1]: Started Initial hibernation setup job.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Session c1 of user root.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Session c2 of user root.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopping OpenSSH server daemon...
Jul 15 13:43:32 wazuh-server systemd[1]: Stopped OpenSSH server daemon.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopped OpenSSH Server Key Generation.
Jul 15 13:43:32 wazuh-server systemd[1]: Stopping OpenSSH Server Key Generation...
Jul 15 13:43:32 wazuh-server systemd[1]: Starting OpenSSH server daemon...
Jul 15 13:43:32 wazuh-server systemd[1]: Started OpenSSH server daemon.
Jul 15 13:43:32 wazuh-server systemd[1]: Started Execute cloud user/final scripts.
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: A terminally deprecated method in java.lang.System has been called
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/wazuh-indexer/lib/opensearch-2.13.0.jar)
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Jul 15 13:43:34 wazuh-server systemd-entrypoint[2324]: WARNING: System::setSecurityManager will be removed in a future release
Jul 15 13:43:44 wazuh-server systemd-logind[1850]: New session 2 of user wazuh-user.
Jul 15 13:43:44 wazuh-server systemd[1]: Created slice User Slice of wazuh-user.
Jul 15 13:43:44 wazuh-server systemd[1]: Started Session 2 of user wazuh-user.
Jul 15 13:44:20 wazuh-server systemd[1]: Started Wazuh manager.
Jul 15 13:44:20 wazuh-server systemd[1]: wazuh-indexer.service start operation timed out. Terminating. |
Closer inspection on
|
I also tried starving the system's RAM to check if that could trigger a timeout, but it exits with a code instead:
|
I brought this subject up with the support team and they mention this is a common issue when the disk is under stress and that a common fix is to increase the systemd service timeout. |
The old packages used to use a timeout of 180 seconds, while the newer ones generated from the wazuh-indexer repo, are currently using 75.
We need to bump this up and test the AMI deploy again. |
Describe the bug
In https://github.com/wazuh/internal-devel-requests/issues/1262, we have detected that the Wazuh dashboard can not load because the Wazuh indexer service is failed after the AMI is built:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The Wazuh indexer should be running.
Host/Environment (please complete the following information):
Additional context
If the indexer service is restarted, everything works as expected.
The text was updated successfully, but these errors were encountered: