-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MlMappingsUpgradeIT.testMappingsUpgrade fails with testclusters #46262
Comments
Pinging @elastic/ml-core |
The PR that implements the conversion is #46265 |
For the record, this test is muted in 7.5 through to master. |
The problem is this:
The test clusters code is killing the ML jobs before it kills the ES JVM. |
The code that causes the problem is elasticsearch/buildSrc/src/main/java/org/elasticsearch/gradle/testclusters/ElasticsearchNode.java Lines 840 to 841 in d442ff9
It needs to kill the ES JVM before the ML processes. Otherwise the ES JVM thinks the ML processes have crashed, records them as failed and then they don't start again automatically during the rolling upgrade. |
The testclusters shutdown code was killing child processes of the ES JVM before the ES JVM. This causes any running ML jobs to be recorded as failed, as the ES JVM notices that they have disconnected from it without being told to stop, as they would if they crashed. In many test suites this doesn't matter because the test cluster will never be restarted, but in the case of upgrade tests it makes it impossible to test what happens when an ML job is running at the time of the upgrade. This change reverses the order of killing the ES process tree such that the parent processes are killed before their children. A list of children is stored before killing the parent so that they can subsequently be killed (if they don't exit by themselves as a side effect of the parent dying). Fixes elastic#46262
The testclusters shutdown code was killing child processes of the ES JVM before the ES JVM. This causes any running ML jobs to be recorded as failed, as the ES JVM notices that they have disconnected from it without being told to stop, as they would if they crashed. In many test suites this doesn't matter because the test cluster will never be restarted, but in the case of upgrade tests it makes it impossible to test what happens when an ML job is running at the time of the upgrade. This change reverses the order of killing the ES process tree such that the parent processes are killed before their children. A list of children is stored before killing the parent so that they can subsequently be killed (if they don't exit by themselves as a side effect of the parent dying). Fixes #46262
The testclusters shutdown code was killing child processes of the ES JVM before the ES JVM. This causes any running ML jobs to be recorded as failed, as the ES JVM notices that they have disconnected from it without being told to stop, as they would if they crashed. In many test suites this doesn't matter because the test cluster will never be restarted, but in the case of upgrade tests it makes it impossible to test what happens when an ML job is running at the time of the upgrade. This change reverses the order of killing the ES process tree such that the parent processes are killed before their children. A list of children is stored before killing the parent so that they can subsequently be killed (if they don't exit by themselves as a side effect of the parent dying). Fixes elastic#46262
I'm working in porting bwc tests to run with testclusters and can't get this one to pass:
https://gradle-enterprise.elastic.co/s/oqrlvw6evjqa4/tests/q3vpzt4orcdaq-vgix7kzieu2xs
I'm going to mute this since it's the only one that fails with this conversion and kindly ask someone from the team to take a look.
The text was updated successfully, but these errors were encountered: