Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] HeapAttackIT testGroupOnManyLongs failing #100640

Closed
Tracked by #100528
mark-vieira opened this issue Oct 10, 2023 · 11 comments · Fixed by #102831
Closed
Tracked by #100528

[CI] HeapAttackIT testGroupOnManyLongs failing #100640

mark-vieira opened this issue Oct 10, 2023 · 11 comments · Fixed by #102831
Assignees
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:QL (Deprecated) Meta label for query languages team >test-failure Triaged test failures from CI
Milestone

Comments

@mark-vieira
Copy link
Contributor

Looks to be consistently failing on Windows.

Build scan:
https://gradle-enterprise.elastic.co/s/5tdfiyv7dtveq/tests/:x-pack:plugin:esql:qa:server:single-node:javaRestTest/org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT/testGroupOnManyLongs
Reproduction line:

gradlew ':x-pack:plugin:esql:qa:server:single-node:javaRestTest' --tests "org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.testGroupOnManyLongs" -Dtests.seed=66A6D752483ECAF6 -Dtests.locale=cs-CZ -Dtests.timezone=Europe/Oslo -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT&tests.test=testGroupOnManyLongs
Failure excerpt:

java.net.SocketTimeoutException: 300 000 milliseconds timeout on connection http-outgoing-7 [ACTIVE]

  at __randomizedtesting.SeedInfo.seed([66A6D752483ECAF6:1826E48EBDD2450B]:0)
  at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:915)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:300)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:303)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
  at org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.query(HeapAttackIT.java:281)
  at org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.groupOnManyLongs(HeapAttackIT.java:134)
  at org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.testGroupOnManyLongs(HeapAttackIT.java:118)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: java.net.SocketTimeoutException: 300 000 milliseconds timeout on connection http-outgoing-7 [ACTIVE]

    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
    at java.lang.Thread.run(Thread.java:1583)

@mark-vieira mark-vieira added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI labels Oct 10, 2023
@elasticsearchmachine elasticsearchmachine added blocker Team:QL (Deprecated) Meta label for query languages team labels Oct 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@ChrisHegarty
Copy link
Contributor

This is not a blocker. Removing the blocker label.

@ChrisHegarty
Copy link
Contributor

I'll look at muting this test on Windows or otherwise increasing the timeout.

@ChrisHegarty ChrisHegarty added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Oct 11, 2023
@cbuescher
Copy link
Member

Fwiw there are more failures of this still on 8.11, at least one of them doesn't seem to be on Windows:

https://gradle-enterprise.elastic.co/s/bqpmzfcaomiq4

I'll check if the mute from above is already present on the 8.11 branch since I don't understand why this is still failing.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 31, 2023
@cbuescher
Copy link
Member

I muted this on 8.11 as well with #101580

@costin costin added this to the 8.12 milestone Nov 16, 2023
dnhatn added a commit that referenced this issue Dec 1, 2023
This commit addresses the issue of missing memory tracking for the 
BitSet in TopN.Row. Instead of introducing BreakingBitSet, we replace
the BitSet with a smaller array of offsets in this PR. Nik suggested to
remove that BitSet, but I haven't looked into that option yet.

Closes #100640
Closes #102683
Closes #102790
Closes #102784
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Dec 1, 2023
This commit addresses the issue of missing memory tracking for the 
BitSet in TopN.Row. Instead of introducing BreakingBitSet, we replace
the BitSet with a smaller array of offsets in this PR. Nik suggested to
remove that BitSet, but I haven't looked into that option yet.

Closes elastic#100640
Closes elastic#102683
Closes elastic#102790
Closes elastic#102784
elasticsearchmachine pushed a commit that referenced this issue Dec 1, 2023
This commit addresses the issue of missing memory tracking for the 
BitSet in TopN.Row. Instead of introducing BreakingBitSet, we replace
the BitSet with a smaller array of offsets in this PR. Nik suggested to
remove that BitSet, but I haven't looked into that option yet.

Closes #100640
Closes #102683
Closes #102790
Closes #102784
@kkrik-es
Copy link
Contributor

kkrik-es commented Dec 7, 2023

This is also failing in 8.11, shall we port the fix or the mute?

@dnhatn dnhatn self-assigned this Dec 7, 2023
@mark-vieira
Copy link
Contributor Author

This is also failing in 8.11, shall we port the fix or the mute?

If there is a fix, it should be backported.

@dnhatn dnhatn reopened this Dec 7, 2023
dnhatn added a commit that referenced this issue Dec 7, 2023
We've made some improvements in memory tracking in ESQL, but due to 
their complexity, we intentionally chose not to backport them to 8.11.
Without these enhancements, some HeapAttack tests are not ready for
8.11. I think we should remove the AwaitsFix tests and focus on 8.13
instead.

Closes #100640
@dnhatn
Copy link
Member

dnhatn commented Dec 7, 2023

We have removed these tests in 8.11 in #103154.

@dnhatn dnhatn closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:QL (Deprecated) Meta label for query languages team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants