Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pid to identify die-with-dignity process #77385

Closed
wants to merge 2 commits into from

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Sep 7, 2021

The die-with-dignity test forces an OOM in Elasticsearch and sees that
it dies appropriately. The test first asserts the correct ES process is
running, and that that process no longer exists after forcing the OOM.
Unfortunately the way of identifying the pid through jps args doesn't
always work (see https://bugs.openjdk.java.net/browse/JDK-7091209).

Instead, this commit passes the known pid for the test cluster through
to the die-with-dignity test, so that it can check for the pid instead
of the command line options.

closes #77282

The die-with-dignity test forces an OOM in Elasticsearch and sees that
it dies appropriately. The test first asserts the correct ES process is
running, and that that process no longer exists after forcing the OOM.
Unfortunately the way of identifying the pid through jps args doesn't
always work (see https://bugs.openjdk.java.net/browse/JDK-7091209).

Instead, this commit passes the known pid for the test cluster through
to the die-with-dignity test, so that it can check for the pid instead
of the command line options.

closes elastic#77282
@rjernst rjernst added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label v7.16.0 labels Sep 7, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Sep 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. I'm not completely convinced by the approach, the reason being because of PID reuse potentially leading to spurious failures. What do you think? Note that I offered an alternative approach in #77373 to deal with the jps truncation issue.

@@ -95,6 +68,22 @@ public void testDieWithDignity() throws Exception {
}
}

private boolean javaPidExists(String expectedPid) throws IOException {
final String jpsPath = PathUtils.get(System.getProperty("runtime.java.home"), "bin/jps").toString();
final Process process = new ProcessBuilder().command(jpsPath, "-v").start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for verbose output if all we're checking for is the PID. You can use -q instead for only PID output. That will simplify the parsing below too.

try (InputStream is = process.getInputStream(); BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"))) {
String line;
while ((line = in.readLine()) != null) {
String foundPid = line.split(" ")[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified after you switch to using quiet output.

@rjernst
Copy link
Member Author

rjernst commented Sep 7, 2021

I'm not completely convinced by the approach, the reason being because of PID reuse potentially leading to spurious failures

PID reuse by another jvm? It seems like we would have to have a ton of processes being created very quickly AND it would have to be another jvm, so the odds would be so close to zero it's no worth worrying about.

I offered an alternative approach in #77373

Sorry I had not seen that! I'm indifferent as to which is better. The idea here was to eliminate the need for finding the process by heuristic.

Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mark-vieira
Copy link
Contributor

PID reuse by another jvm? It seems like we would have to have a ton of processes being created very quickly AND it would have to be another jvm, so the odds would be so close to zero it's no worth worrying about.

Our builds do create loads of JVMs, one for each test worker, and they are coming and going all the time. I tend to agree that reuse is still probably pretty low though.

Question: we only assert on the PID, right? We don't try to do any cleanup here, like try and kill it if it still exists or anything like that do we? The reason I ask is because we still have issues with crashing test executors so I'm curious if have tests killing processes that might be a culprit.

@rjernst
Copy link
Member Author

rjernst commented Sep 7, 2021

we only assert on the PID, right?

Yes, we only assert on pid, no cleanup.

The reason I ask is because we still have issues with crashing test executors

Is there a stack trace for these crashes?

@jasontedor
Copy link
Member

Yes, reuse would be rare as Linux is implemented today (I don’t know about macOS or Windows). But the behavior on Linux isn’t guaranteed and some systems do assign PIDs differently (than sequentially with wrap around) (source: AIX, which is why I’m sensitive to this issue). Why allow for possible test flakiness?

@jasontedor
Copy link
Member

According to this Stack Overflow post, on Windows, PIDs are immediately available for reuse and recycling happens frequently.

@rjernst
Copy link
Member Author

rjernst commented Sep 8, 2021

I don't feel strongly about this, so I will close it in favor of #77373

@rjernst rjernst closed this Sep 8, 2021
@rjernst rjernst deleted the test/die-with-dignity-pid branch September 8, 2021 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DieWithDignityIT.testDieWithDignity failures on CI
6 participants