-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DieWithDignityIT.testDieWithDignity failures on CI #77282
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Muting on |
There are two failures that may be related: |
Here's the problem. We run |
The die-with-dignity test forces an OOM in Elasticsearch and sees that it dies appropriately. The test first asserts the correct ES process is running, and that that process no longer exists after forcing the OOM. Unfortunately the way of identifying the pid through jps args doesn't always work (see https://bugs.openjdk.java.net/browse/JDK-7091209). Instead, this commit passes the known pid for the test cluster through to the die-with-dignity test, so that it can check for the pid instead of the command line options. closes elastic#77282
Looks to still fail: https://gradle-enterprise.elastic.co/s/oxkjscbj6d5fq Now with:
|
This commit simplifies the information about running JVMs that is collected during the die with dignity test. The reason that we are doing this is because collecting information about all running JVMs leads to too much output, which makes understanding a test failure too hard. By simplifying the information collected to only running Elasticsearch instances, there will be less output from the test failure and it should be easier to understand what is going on. That is, this commit is purely seeking to get more information about an ongoing test failure. Relates #77282
This commit simplifies the information about running JVMs that is collected during the die with dignity test. The reason that we are doing this is because collecting information about all running JVMs leads to too much output, which makes understanding a test failure too hard. By simplifying the information collected to only running Elasticsearch instances, there will be less output from the test failure and it should be easier to understand what is going on. That is, this commit is purely seeking to get more information about an ongoing test failure. Relates #77282
It's not clear what happened in this failure, some of the output from the failed assertion was truncated. I pushed #77504 so that there's significantly less output, and therefore less likely to be truncated so that hopefully tracking this down on the next failure will be easier. |
This just failed a bunch more on Windows: |
Failed a bunch more on Windows again. I am muting it for Windows #77537 |
@rjernst Would you be available to investigate why this is failing on Windows? It appears that |
Thsi commit adds debugging info to the die with dignity test to figure out why no ES command lines are found on Windows. relates elastic#77282
I added some additional debugging info to the test, and the Elasticsearch process is simply not found by
But the pids we get back from
|
on my windows laptop I have managed to reproduce this - in fact it fails on every attempt I have run this with some additional println debugging It looked like elasticserach was not running?
|
I debugged a bit further and I think there might be something wrong with jps on windows..
when running jps however that pid was not there
when running jcmd I got a response though..
as a side note I think jcmd might be suffering from output truncation too |
a comparison of tasklist (containing an elasticserach process (13504) and jps)
|
This commit rewrites the DieWithDignity test to use the new test infra. A side effect of this change is that it no longer relies on jps, which appears to have issues on Windows. closes elastic#77282
This commit rewrites the DieWithDignity test to use the new test infra. A side effect of this change is that it no longer relies on jps, which appears to have issues on Windows. closes #77282
Build scan: https://gradle-enterprise.elastic.co/s/fl56dmvxet2zw
Repro line:
./gradlew ':test:external-modules:test-die-with-dignity:javaRestTest' --tests "org.elasticsearch.qa.die_with_dignity.DieWithDignityIT.testDieWithDignity" -Dtests.seed=C3A1E728D7318E71 -Dtests.locale=es-CU -Dtests.timezone=Etc/GMT+9 -Druntime.java=8
Reproduces locally?: No
Applicable branches:
master
and7.x
Failure history: First failed on Sep. 1 (build scan), has failed 26 times since.
Failure excerpt:
/cc @jasontedor as it looks like you touched this test in #77039 not too long before the first failure, could be related?
The text was updated successfully, but these errors were encountered: