Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug flaky test Issue#9415 #9421

Conversation

aczajkowski
Copy link
Member

@aczajkowski aczajkowski commented Sep 29, 2021

Fixes flaky test io.trino.plugin.hive.TestHive3InsertOverwrite flaky
Fixes: #9415

CC: @losipiuk

@findepi
Copy link
Member

findepi commented Sep 29, 2021

@aczajkowski can you please explain how this helps, for the benefit of future cases like this?

at the first sight, it could seem that subclasses of BaseTestHiveInsertOverwrite do not share any infrastructure, so they in theory could use same schema and table names

@aczajkowski
Copy link
Member Author

aczajkowski commented Sep 29, 2021

@aczajkowski can you please explain how this helps, for the benefit of future cases like this?

at the first sight, it could seem that subclasses of BaseTestHiveInsertOverwrite do not share any infrastructure, so they in theory could use same schema and table names

Only explanation to such behaviour that came in to my mind is that setUp method may be called by separate threads sharing HiveMinioDataLake. Effectively sharing same HMS which may lead to such situation.
If this issue will remain after this fix I'm sure we share test instance cross Threads. And it means we will need to move schema creation to test methods. I haven't found clear information in TestNG docs if this is the case.

@aczajkowski aczajkowski force-pushed the acz/fix_flaky_hive_insert_overwrite branch from e0a85c4 to 9f9bd72 Compare September 29, 2021 10:14
@losipiuk
Copy link
Member

Only explanation to such behaviour that came in to my mind is that setUp method may be called by separate threads sharing HiveMinioDataLake

But is it?
I do not see why it should.

@aczajkowski aczajkowski force-pushed the acz/fix_flaky_hive_insert_overwrite branch from 9728db3 to 9f9bd72 Compare September 30, 2021 14:21
@electrum
Copy link
Member

electrum commented Oct 1, 2021

The information you are looking for is mostly here and here, though it's not exactly straightforward to understand. An important part is this:

When you use the parameter parallel, TestNG will try to run all your test methods in separate threads, except for methods that depend on each other, which will be run in the same thread in order to respect their order of execution.

We set these:

<parallel>methods</parallel>
<threadCount>2</threadCount>

We thus use these defaults:

<forkCount>1</forkCount>
<reuseForks>true</reuseForks>

This means that all tests will be run in a single, separate JVM (from the one used by Maven). TestNG creates each test class once, then runs the test methods in parallel using two threads. Each class is independent, so unless you are sharing static state between classes, or external resources such as files, they should be independent.

I agree with @findepi that this change should not have any effect. For your debug PR, you can edit the ci.yml to remove everything but the job for your test. You could also duplicate it several times by copy/paste or a matrix to reproduce easier. You can also try running the build with Maven to reproduce locally. Try a higher thread count via -Dair.test.thread-count=8 to increase chances of reproducing.

@aczajkowski
Copy link
Member Author

aczajkowski commented Oct 1, 2021

@electrum Thx for all the information. it confirms my assumptions on how TestNG works with conjunction with surefire.
I will still try to reproduce it. Probably with CI as you suggested + some debug logs. Locally I've tried increasing threads no. with no luck.
All above leads to fact that conflict is created between 2 tests during setup

io.trino.plugin.hive.TestHive2InsertOverwrite
io.trino.plugin.hive.TestHive3InsertOverwrite

So during Test object initialisation/creation. We should have separate threads and object in such case.
Looks interesting - what actually caused such behaviour.

UPDATE:

  • I've made 100 CI jobs each running same module test + increased no. of threads. All passed 🤔
    I saw @findepi made similar CI run with 60'ty paraller mvn tests. Also all passed instead of two - seems not connected with current issue:
Error:  java.lang.OutOfMemoryError: Java heap space
Error:  Terminating due to java.lang.OutOfMemoryError: Java heap space


Error:  org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/11.0.12-7/x64/bin/java -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter635629729381344146.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2021-10-01T13-49-35_554-jvmRun1 surefire2716430754111622965tmp surefire_115914175973654802142tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 3
Error:  Crashed tests:
Error:  io.trino.plugin.hive.TestHiveConnectorTest

@aczajkowski aczajkowski force-pushed the acz/fix_flaky_hive_insert_overwrite branch from 9f9bd72 to e8a8385 Compare October 1, 2021 13:07
@aczajkowski aczajkowski changed the title Avoid using same schema name for multiple tests Debug flaky test Oct 1, 2021
@aczajkowski aczajkowski changed the title Debug flaky test Debug flaky test Issue#9415 Oct 1, 2021
@aczajkowski aczajkowski removed the request for review from losipiuk October 1, 2021 13:17
@aczajkowski aczajkowski force-pushed the acz/fix_flaky_hive_insert_overwrite branch 3 times, most recently from 2a58f3b to 6754d40 Compare October 3, 2021 21:57
@aczajkowski aczajkowski force-pushed the acz/fix_flaky_hive_insert_overwrite branch from 6754d40 to 272356d Compare October 4, 2021 07:23
@aczajkowski aczajkowski deleted the acz/fix_flaky_hive_insert_overwrite branch December 16, 2021 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

io.trino.plugin.hive.TestHive3InsertOverwrite flaky: Schema already exists: 'hive_insert_overwrite'
4 participants