Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test handling in CI #6025

Merged
merged 12 commits into from
May 18, 2022
Merged

Conversation

michaelkaye
Copy link
Contributor

@michaelkaye michaelkaye commented May 11, 2022

Type of change

  • Feature
  • Bugfix
  • Technical
  • Other :

Content

All tests I found in the most recent test runs that are failing have been marked @Ignore.

Having merged other PRs and discussed; increasing timeouts seems to make the tests work (reflecting that the underlying everything is being slow). Have removed the @ ignores for now, replaced with increased timeouts and retries.

Additional test failures that have shown up as part of this have been fixed afterwards.

This should allow new failures of integration tests in CI to be instantly visible when the PR run fails.

Motivation and context

As discussed in a retro a few weeks ago, we want to know when the longer tests fail due to new code. As they're slow to build/run and need macos runners, we're not running on every commit, but we will verify after each PR is merged, as a reasonable intemediate between "too much" and "not enough".

Checklist

@github-actions
Copy link

github-actions bot commented May 11, 2022

Unit Test Results

122 files  ±0  122 suites  ±0   2m 7s ⏱️ -13s
205 tests ±0  205 ✔️ ±0  0 💤 ±0  0 ±0 
690 runs  ±0  690 ✔️ ±0  0 💤 ±0  0 ±0 

Results for commit 868c33a. ± Comparison against base commit 3674ae7.

♻️ This comment has been updated with latest results.

@bmarty bmarty requested a review from BillCarsonFr May 11, 2022 15:37
@michaelkaye michaelkaye marked this pull request as ready for review May 12, 2022 08:56
@michaelkaye
Copy link
Contributor Author

TODO: after merge, create an issue to track these have been @ignored (or add to an existing one)

// Which is convoluted especially as it involves the app state refreshing
// so; in order to make this be more stable
// I hereby cheat and write:
Thread.sleep(30_000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear: i feel bad about this but i experimented with various options for ~1day and none were more likely to work than this - handling the restart requirement just makes it hard to test with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fair, it sounds like we're missing registering the sync as an IdlingResource in order for espresso to be able to wait for the sync to finish

there's a helper for withRetry if we wanted to poll for the home appearing instead of a blanket 30 second sleep (assuming the original check wasn't providing false positives)

fun waitForHome() {
  // each attempt has a 500ms delay
  withRetry(times = 60)
  ... original logic
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that waitForHome() is using an idling resource on initial sync; but it's idle at two points: Immediately (+/- thread racing) after we return from getting espresso to click the toggle, and then it's busy for 10-15s until after the app has restarted and done the initial sync.

@ouchadam
Copy link
Contributor

out of interest, do these tests pass locally but fail on the CI?

ouchadam
ouchadam previously approved these changes May 13, 2022
Copy link
Contributor

@ouchadam ouchadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 small comment about potentially reducing the 30 second wait but not a blocker (would prefer the CI and local to be passing)

@michaelkaye
Copy link
Contributor Author

out of interest, do these tests pass locally but fail on the CI?

They fail for me locally (on a linux machine) and on GHA CI, and also when using firebase test lab and a remote synapse server to run the UI test code.

@michaelkaye michaelkaye mentioned this pull request May 13, 2022
12 tasks
@michaelkaye michaelkaye changed the title @Ignore all tests currently failing in CI Improve test handling in CI May 13, 2022
@michaelkaye michaelkaye dismissed ouchadam’s stale review May 13, 2022 15:45

Fundimentally changed what PR does.

@michaelkaye michaelkaye force-pushed the michaelk/skip_tests_failing_on_ci branch from 18496f0 to 4ced6ca Compare May 16, 2022 15:00
@michaelkaye
Copy link
Contributor Author

michaelkaye commented May 16, 2022

With the above changes (recently rebased against develop) I'm now getting zero test fails locally (and I've generally been able to replicate all CI test fails locally).

I'm a little concerned that 8950aa3 is going to be slowing down the tests generally because of lots of initializations, but if we want the separation I'm not sure how else to do it.

@michaelkaye michaelkaye force-pushed the michaelk/skip_tests_failing_on_ci branch from 419824d to 1f89cfb Compare May 16, 2022 15:55
@michaelkaye michaelkaye requested a review from bmarty May 16, 2022 16:01
@michaelkaye
Copy link
Contributor Author

specifcally asking for review from @bmarty for 8950aa3 related to a recentl change,

and @BillCarsonFr to have a confirm whether a19c1d6 which changes the actual logic of a crypto test is right.

@bmarty
Copy link
Member

bmarty commented May 17, 2022

specifcally asking for review from @bmarty for 8950aa3 related to a recentl change,

Having double checked, I do not see what recent change may require this change. But if it fixes the tests, it's OK for me.

Asking for @ganfra advice.

and @BillCarsonFr to have a confirm whether a19c1d6 which changes the actual logic of a crypto test is right.

I do not want to answer in place of @BillCarsonFr , but the change is now matching the message in the checks.

Copy link
Member

@bmarty bmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small remarks

@@ -37,6 +39,8 @@ private const val DUMMY_DEVICE_KEY = "DeviceKey"
@RunWith(AndroidJUnit4::class)
class CryptoStoreTest : InstrumentedTest {

@get:Rule var rule = RetryTestRule(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be val instead of val (= immutable). Applicable to all the occurrences on this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Have updated these.

@BillCarsonFr
Copy link
Member

specifcally asking for review from @bmarty for 8950aa3 related to a recentl change,

Having double checked, I do not see what recent change may require this change. But if it fixes the tests, it's OK for me.

Asking for @ganfra advice.

and @BillCarsonFr to have a confirm whether a19c1d6 which changes the actual logic of a crypto test is right.

I do not want to answer in place of @BillCarsonFr , but the change is now matching the message in the checks.

Yes that is correct. Wonder why it was pushed like that :/

@michaelkaye
Copy link
Contributor Author

specifcally asking for review from @bmarty for 8950aa3 related to a recentl change,

Having double checked, I do not see what recent change may require this change. But if it fixes the tests, it's OK for me.

Asking for @ganfra advice.

It's the change merged here:

https://github.com/vector-im/element-android/pull/5887/files#diff-7ab56d26588e03ae923b3a9b5a37b0a3e6bdb1c7d560104072426c0e0361118bR68

that will create multiple TestMatrix instances; each eventually does a WorkManager.initialize() further on in the logic:

https://github.com/vector-im/element-android/pull/5887/files#diff-2a024565d457bef75c1870ebbf3454ed8f6f134bdd6a28cf29c71910e0f2697cR69

Which is causing the errors.

@bmarty
Copy link
Member

bmarty commented May 17, 2022

I see, thanks @michaelkaye for the explanation!

@michaelkaye michaelkaye merged commit f730378 into develop May 18, 2022
@michaelkaye michaelkaye deleted the michaelk/skip_tests_failing_on_ci branch May 18, 2022 08:51
@github-actions
Copy link

github-actions bot commented May 18, 2022

Matrix SDK

Integration Tests Results:

  • [org.matrix.android.sdk.session]
    = passed=18 failures=2 errors=0 skipped=3
  • [org.matrix.android.sdk.account]
    = passed=3 failures=0 errors=0 skipped=2
  • [org.matrix.android.sdk.internal]
    = passed=53 failures=4 errors=0 skipped=1
  • [org.matrix.android.sdk.ordering]
    = passed=16 failures=0 errors=0 skipped=0
  • [org.matrix.android.sdk.PermalinkParserTest]
    = passed=2 failures=0 errors=0 skipped=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants