-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pid to identify die-with-dignity process #77385
Conversation
The die-with-dignity test forces an OOM in Elasticsearch and sees that it dies appropriately. The test first asserts the correct ES process is running, and that that process no longer exists after forcing the OOM. Unfortunately the way of identifying the pid through jps args doesn't always work (see https://bugs.openjdk.java.net/browse/JDK-7091209). Instead, this commit passes the known pid for the test cluster through to the die-with-dignity test, so that it can check for the pid instead of the command line options. closes elastic#77282
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I'm not completely convinced by the approach, the reason being because of PID reuse potentially leading to spurious failures. What do you think? Note that I offered an alternative approach in #77373 to deal with the jps
truncation issue.
@@ -95,6 +68,22 @@ public void testDieWithDignity() throws Exception { | |||
} | |||
} | |||
|
|||
private boolean javaPidExists(String expectedPid) throws IOException { | |||
final String jpsPath = PathUtils.get(System.getProperty("runtime.java.home"), "bin/jps").toString(); | |||
final Process process = new ProcessBuilder().command(jpsPath, "-v").start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for verbose output if all we're checking for is the PID. You can use -q
instead for only PID output. That will simplify the parsing below too.
try (InputStream is = process.getInputStream(); BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"))) { | ||
String line; | ||
while ((line = in.readLine()) != null) { | ||
String foundPid = line.split(" ")[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified after you switch to using quiet output.
PID reuse by another jvm? It seems like we would have to have a ton of processes being created very quickly AND it would have to be another jvm, so the odds would be so close to zero it's no worth worrying about.
Sorry I had not seen that! I'm indifferent as to which is better. The idea here was to eliminate the need for finding the process by heuristic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Our builds do create loads of JVMs, one for each test worker, and they are coming and going all the time. I tend to agree that reuse is still probably pretty low though. Question: we only assert on the PID, right? We don't try to do any cleanup here, like try and kill it if it still exists or anything like that do we? The reason I ask is because we still have issues with crashing test executors so I'm curious if have tests killing processes that might be a culprit. |
Yes, we only assert on pid, no cleanup.
Is there a stack trace for these crashes? |
Yes, reuse would be rare as Linux is implemented today (I don’t know about macOS or Windows). But the behavior on Linux isn’t guaranteed and some systems do assign PIDs differently (than sequentially with wrap around) (source: AIX, which is why I’m sensitive to this issue). Why allow for possible test flakiness? |
According to this Stack Overflow post, on Windows, PIDs are immediately available for reuse and recycling happens frequently. |
I don't feel strongly about this, so I will close it in favor of #77373 |
The die-with-dignity test forces an OOM in Elasticsearch and sees that
it dies appropriately. The test first asserts the correct ES process is
running, and that that process no longer exists after forcing the OOM.
Unfortunately the way of identifying the pid through jps args doesn't
always work (see https://bugs.openjdk.java.net/browse/JDK-7091209).
Instead, this commit passes the known pid for the test cluster through
to the die-with-dignity test, so that it can check for the pid instead
of the command line options.
closes #77282