Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML tests failing due to unfinished tasks #48512

Closed
DaveCTurner opened this issue Oct 25, 2019 · 5 comments
Closed

ML tests failing due to unfinished tasks #48512

DaveCTurner opened this issue Oct 25, 2019 · 5 comments
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

This build failed multiple tests before eventually timing out:

https://gradle-enterprise.elastic.co/s/behidccsxog7w/console-log?task=:x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner

org.elasticsearch.xpack.ml.integration.MlJobIT > testDeleteJobAfterMissingIndex FAILED
    java.lang.AssertionError: 3 active tasks found:
    cluster:admin/xpack/ml/data_frame/analytics/start                 TleYbYSuQg-vciynV-5H9g:53769 -                            transport 1571997516368 09:58:36 20.9m       127.0.0.1 integTest-0 
    cluster:admin/xpack/ml/data_frame/analytics/start                 ZoK6DsQvQYu6shCnCwZK2Q:79195 TleYbYSuQg-vciynV-5H9g:53769 transport 1571997516368 09:58:36 20.9m       127.0.0.1 integTest-2 
    cluster:admin/xpack/ml/data_frame/analytics/estimate_memory_usage ZoK6DsQvQYu6shCnCwZK2Q:79219 -                            transport 1571997516385 09:58:36 20.9m       127.0.0.1 integTest-2 
     expected:<0> but was:<3>
        at __randomizedtesting.SeedInfo.seed([E70883D6CA8F18D7:467338869AA60026]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:645)
        at org.elasticsearch.test.rest.ESRestTestCase.lambda$waitForPendingTasks$3(ESRestTestCase.java:387)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:893)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:866)
        at org.elasticsearch.test.rest.ESRestTestCase.waitForPendingTasks(ESRestTestCase.java:363)
        at org.elasticsearch.test.rest.ESRestTestCase.waitForPendingTasks(ESRestTestCase.java:350)
        at org.elasticsearch.xpack.ml.integration.MlJobIT.clearMlState(MlJobIT.java:790)
REPRODUCE WITH: ./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.ml.integration.MlJobIT.testDeleteJobAfterMissingIndex" -Dtests.seed=E70883D6CA8F18D7 -Dtests.security.manager=true -Dtests.locale=tr -Dtests.timezone=Asia/Kashgar -Dcompiler.java=12 -Druntime.java=8
@DaveCTurner DaveCTurner added >test-failure Triaged test failures from CI :ml/Transform Transform labels Oct 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@droberts195
Copy link
Contributor

This is probably not related to the MlJobIT suite at all, but collateral damage of #47612 (comment). The 3 outstanding tasks are data frame analytics tasks. #47612 (comment) suggests that the C++ process is crashing very early on when a data frame analytics task is started.

This issue does highlight that we are not always cancelling the associated task if the C++ process for data frame analytics does not start successfully.

@droberts195 droberts195 added :ml Machine learning and removed :ml/Transform Transform labels Oct 25, 2019
@ywangd
Copy link
Member

ywangd commented Jan 5, 2021

A bunch of ML test failures (59) due to unfinished tasks: https://gradle-enterprise.elastic.co/s/sl7yrz5qkdjou

The stacktraces are all pretty similar:

org.elasticsearch.xpack.ml.integration.MlJobIT > testDeleteJobAsync FAILED
    java.lang.AssertionError: 3 active tasks found:
    xpack/ml/job[c]                2lg6T2ggTwSnEQ7RaNAWRQ:3595  cluster:10                   persistent 1609808941663 01:09:01 11.6m       127.0.0.1 javaRestTest-0 job-scheduled-events-interim-results
    xpack/ml/job[c]                2lg6T2ggTwSnEQ7RaNAWRQ:3842  cluster:11                   persistent 1609808953477 01:09:13 11.4m       127.0.0.1 javaRestTest-0 job-scheduled-events-add-to-new-job--with-global-calendar
    xpack/ml/job[c]                2lg6T2ggTwSnEQ7RaNAWRQ:4128  cluster:12                   persistent 1609808965153 01:09:25 11.2m       127.0.0.1 javaRestTest-0 job-scheduled-events
     expected:<0> but was:<3>
        at __randomizedtesting.SeedInfo.seed([7ECC5C376667E708:A8098DD37CA6BF72]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:645)
        at org.elasticsearch.test.rest.ESRestTestCase.lambda$waitForPendingTasks$2(ESRestTestCase.java:395)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:955)
        at org.elasticsearch.test.rest.ESRestTestCase.waitForPendingTasks(ESRestTestCase.java:371)
        at org.elasticsearch.xpack.ml.integration.MlJobIT.clearMlState(MlJobIT.java:846)

@ywangd
Copy link
Member

ywangd commented Jan 5, 2021

Another two:

Note both of them are "release-tests". Not sure if that is relevant. Not reproducible locally.

@droberts195
Copy link
Contributor

The original report is nearly 2 years old and almost certainly fixed by a PR that's just not linked to this issue.

The more recent failure from January was almost certainly caused by a temporary mismatch between the non-snapshot build ML C++ and Java code. (Probably one non-snapshot build failed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants