-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Timeout in ml.integration.RegressionIT.testStopAndRestart on Windows 2019 #50177
Comments
Pinging @elastic/ml-core (:ml) |
This is the first time I'm seeing this exact issue related to
|
@przemekwitek to me that sequence of log messages suggests a race between start and stop.
It implies this logic is flawed: Lines 200 to 203 in 7107c22
Anomaly detection jobs treat a closing of their input stream as meaning they've been asked to gracefully shut down. Not sure about data frame analytics jobs. But even for anomaly detection jobs there's a race where the process kill could be attempted in between startup and reading the config files, fail, and then the config files get deleted by the JVM before the native process tries to read them. So I think it would be worth trying a wait of Please give it a try and if it doesn't cause any easily reproducible problems locally that suggest it's a bad idea then add it to the code and we can see if it stops these related intermittent failures of the various "stop and restart" tests. |
Thanks for the analysis, @droberts195. I'll try it out. |
Example build scan: https://gradle-enterprise.elastic.co/s/ssxafsqxsb6sy
Happened on Windows 2019.
Reproduction line:
Doesn't reproduce locally.
Given it happens only on Windows workers, is it just a matter of worker slowness hence requiring increase in suite timeout (currently it's 30m)? (AFAIK Windows workers aren't using a ram disk, hence expected to be slower).
Other issues related to the same test are tracked in #47612
The text was updated successfully, but these errors were encountered: