Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] HeapAttackIT testManyConcat failing #100678

Closed
Tracked by #100528
piergm opened this issue Oct 11, 2023 · 5 comments · Fixed by #104107
Closed
Tracked by #100528

[CI] HeapAttackIT testManyConcat failing #100678

piergm opened this issue Oct 11, 2023 · 5 comments · Fixed by #104107
Assignees
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI

Comments

@piergm
Copy link
Member

piergm commented Oct 11, 2023

Build scan:
https://gradle-enterprise.elastic.co/s/z27lugpake4um/tests/:x-pack:plugin:esql:qa:server:single-node:javaRestTest/org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT/testManyConcat
Reproduction line:

./gradlew ':x-pack:plugin:esql:qa:server:single-node:javaRestTest' --tests "org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.testManyConcat" -Dtests.seed=6986B005E1B4E2DE -Dtests.locale=vi-VN -Dtests.timezone=America/Lima -Druntime.java=17

Applicable branches:
main

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT&tests.test=testManyConcat
Failure excerpt:

java.lang.OutOfMemoryError: Java heap space

  at __randomizedtesting.SeedInfo.seed([6986B005E1B4E2DE:10470D421C8F6C49]:0)
  at org.apache.http.util.CharArrayBuffer.expand(CharArrayBuffer.java:60)
  at org.apache.http.util.CharArrayBuffer.append(CharArrayBuffer.java:90)
  at org.apache.http.util.EntityUtils.toString(EntityUtils.java:228)
  at org.apache.http.util.EntityUtils.toString(EntityUtils.java:308)
  at org.elasticsearch.xpack.esql.qa.single_node.HeapAttackIT.testManyConcat(HeapAttackIT.java:184)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)

@piergm piergm added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI blocker labels Oct 11, 2023
@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 11, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@ChrisHegarty
Copy link
Contributor

This is not a blocker. Removing the blocker label.

@mark-vieira
Copy link
Contributor

We continue to have several failures in `HeapAttackIT1 every day. It's across a wide range of tests and recently looks to be mostly timeouts presumably due to the test taking out the test cluster.

The nature of this test seems to make it very susceptible to instability. We should probably mute it entirely unless we think there's a simple fix here given how much noise it's creating.

@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
dnhatn added a commit that referenced this issue Jan 10, 2024
This PR enables ESQL heap attack tests. I have run this suite over 500 
iterations with different hardware configurations for the last two days,
and all have been successful with the changes in #104159.

Additionally, this PR adds an action that can trigger OOM to generate a 
heap dump if a test takes more than 5 minutes. I've seen cases
(previously with our CI) where the test didn't result in OOM but was
taking too long. Having the ability to inspect the heap in such cases
would be beneficial.

Closes #103527
Closes #100678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants