-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] :qa:full-cluster-restart:v7.3.2#upgradedClusterTest Failed on Windows #46014
Comments
Pinging @elastic/es-core-infra |
Same error for |
Similar error here, though not identical it looks like the same underlying issue:
Failed around 5 times over the weekend with same/similar error. |
Looking at one of the stack traces of the failure:
A restart operation is implemented as a |
another one: https://gradle-enterprise.elastic.co/s/jsaajm7ezmr6q |
How can they still be open? It looks like we always do a SIGKILL in this case, and even if we were doing a SIGTERM, there is logic in stop to wait on the pid. Retrying seems like it would be masking a different issue we need to identify. |
@rjernst |
There is a variation on this in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/os=windows-2019/129/console
In this case the process that's stopping the files being deleted is the ML controller process. Doing a
5 seconds should be ample for the controller to notice that the ES JVM has died and exit by itself (especially given that this problem is fairly rare even without any retries). The only other alternative would be for the testclusters code to kill the controller explicitly as well as the JVM before cleaning up installation directories. But if multiple clusters are running in parallel that may not be easy. It would need to kill the controller whose parent process is the JVM. If it killed all processes named |
Another one like the original issue description, i.e. due to |
When retarting the cluster we clean up old distribution files that might still be in use by the OS. Windows closes resources of ded processes async, so we do a couple of retries to get arround it. Closes elastic#46014
…up (#46539) * Retry deleting distro dir on windows When retarting the cluster we clean up old distribution files that might still be in use by the OS. Windows closes resources of ded processes async, so we do a couple of retries to get arround it. Closes #46014 * Avoid having to delete the distro folder.
…up (elastic#46539) * Retry deleting distro dir on windows When retarting the cluster we clean up old distribution files that might still be in use by the OS. Windows closes resources of ded processes async, so we do a couple of retries to get arround it. Closes elastic#46014 * Avoid having to delete the distro folder.
* Use versions specific distribution folders so we don't need to clean up (#46539) * Retry deleting distro dir on windows When retarting the cluster we clean up old distribution files that might still be in use by the OS. Windows closes resources of ded processes async, so we do a couple of retries to get arround it. Closes #46014 * Avoid having to delete the distro folder. * Remove the use of ClusterFormationTasks form RestTestTask (#47022) This PR removes a use-case of the ClusterFormationTasks and converts a project that flew under the radar so far. There's probably more clean-up possible here, but for now the goal is to be able to remove that code after `RunTask` is also updated. * Migrate some 7.x only projects
The test goal failed with this error:
Not sure where this is coming from, but it seems first a cluster fails to form and then the cleanup fails because Gradle tries to delete files still in use by a running test cluster.
Build Scan
The text was updated successfully, but these errors were encountered: