TooManyJobsIT#testMultipleNodes fails: Had to resort to force-closing job #48511

DaveCTurner · 2019-10-25T10:36:37Z

I see a couple of builds failed like this:

org.elasticsearch.xpack.ml.integration.TooManyJobsIT > testMultipleNodes FAILED
    java.lang.RuntimeException: Had to resort to force-closing job, something went wrong?

        Caused by:
        java.util.concurrent.ExecutionException: org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:44709][cluster:admin/xpack/ml/job/close]

            Caused by:
            org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:44709][cluster:admin/xpack/ml/job/close]

                Caused by:
                java.lang.IllegalStateException: Timed out when waiting for persistent tasks after 30s
REPRODUCE WITH: ./gradlew ':x-pack:plugin:ml:internalClusterTest' --tests "org.elasticsearch.xpack.ml.integration.TooManyJobsIT.testMultipleNodes" -Dtests.seed=62CDCB396D54BF7A -Dtests.security.manager=true -Dtests.locale=is -Dtests.timezone=Europe/Sofia -Dcompiler.java=12 -Druntime.java=11

Possibly this is #30300 again, or maybe this is just the CI machine running slowly?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-25T10:36:38Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2019-12-12T14:20:40Z

or maybe this is just the CI machine running slowly?

Yes. We allow 30 seconds to close all the jobs. They did actually all close successfully, but this took 79 seconds in the first build and 48 seconds in the second. The log from the first failure shows:

[2019-08-09T02:02:48,163][INFO ][o.e.x.m.i.TooManyJobsIT  ] [testMultipleNodes] Closing jobs using [_all]
...
[2019-08-09T02:04:07,320][INFO ][o.e.x.m.j.p.a.AutodetectCommunicator] [node_t2] [max-number-of-jobs-limit-job-39] job closed

and the log from the second failure shows:

[2019-10-25T07:48:17,023][INFO ][o.e.x.m.i.TooManyJobsIT  ] [testMultipleNodes] Closing jobs using [_all]
...
[2019-10-25T07:49:05,701][INFO ][o.e.x.m.j.p.a.AutodetectCommunicator] [node_t3] [max-number-of-jobs-limit-job-37] job closed

This is probably due to running multiple test tasks in parallel. If this test happens to run at the some time as some other resource intensive test suite it takes too long.

I will bump the timeout for ML cleanup in the internal cluster tests up to 90 seconds.

Closes elastic#48511

Closes #48511

Closes elastic#48511

DaveCTurner added >test-failure Triaged test failures from CI :ml Machine learning labels Oct 25, 2019

droberts195 self-assigned this Dec 12, 2019

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Dec 12, 2019

[TEST] Increase timeout for ML internal cluster cleanup

ab16d2c

Closes elastic#48511

droberts195 mentioned this issue Dec 12, 2019

[TEST] Increase timeout for ML internal cluster cleanup #50142

Merged

droberts195 closed this as completed in #50142 Dec 12, 2019

droberts195 added a commit that referenced this issue Dec 12, 2019

[TEST] Increase timeout for ML internal cluster cleanup (#50142)

146f4ec

Closes #48511

droberts195 added a commit that referenced this issue Dec 12, 2019

[TEST] Increase timeout for ML internal cluster cleanup (#50142)

13e47df

Closes #48511

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020

[TEST] Increase timeout for ML internal cluster cleanup (elastic#50142)

8102a52

Closes elastic#48511

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TooManyJobsIT#testMultipleNodes fails: Had to resort to force-closing job #48511

TooManyJobsIT#testMultipleNodes fails: Had to resort to force-closing job #48511

DaveCTurner commented Oct 25, 2019

elasticmachine commented Oct 25, 2019

droberts195 commented Dec 12, 2019

TooManyJobsIT#testMultipleNodes fails: Had to resort to force-closing job #48511

TooManyJobsIT#testMultipleNodes fails: Had to resort to force-closing job #48511

Comments

DaveCTurner commented Oct 25, 2019

elasticmachine commented Oct 25, 2019

droberts195 commented Dec 12, 2019