StateFragmentTest is flaky due to ExplorationProgressController deadlock #4238

BenHenning · 2022-03-08T22:36:19Z

I noticed while developing #2173. StateFragmentTest can't succeed a single run on Espresso today and it's due to a deadlock in ExplorationProgressController that was caused with the introduction of the domain-side hint handler. The deadlock can occur since ExplorationProgressController calls down into the hint handler which can synchronously call back to ExplorationProgressController. Since both the handler and controller have internal locks, this results in a deadlock. The same scenario can occur for questions since it has the same wiring.

The solution to this is to move away from using locks (e.g. by leveraging Kotlin actors and/or channels/flows). Unfortunately, it requires essentially completely rewriting the core data flow for both explorations and questions.

BenHenning · 2022-03-08T22:36:49Z

Note that this issue can hit users in production, but I'm not sure of the exact repro steps. I do suspect that some of our ANRs have been caused by this deadlock, though.

…ss (#4237) ## Explanation Fix #3813 Fix #92 Fix part of #4044 (see below for how this relates to the broader math expressions project) This PR refactors ``AsyncResult`` to be a sealed class rather than using enums which has a distinct advantage: the strong typing guarantees from a sealed class ensure the presence of certain data is guaranteed (such as the ``Throwable`` from the failing ``AsyncResult``). However, this refactor is very far-reaching since essentially every usage of ``AsyncResult`` needed to be updated (though not necessarily just references). ### High-level ``AsyncResult`` pattern changes Here are comparisons in how the common operations have changed: Before: ```kotlin val result: AsyncResult<String> = fetchOrReceiveSomeResult() // Construction. val pending = AsyncResult.pending<String>() val failure = AsyncResult.failure<String>(IllegalStateException("Failure reason")) val success = AsyncResult.success("success value") // Check type. val isPending = result.isPending() val isFailure = result.isFailure() val isSuccess = result.isSuccess() val isCompleted = result.isCompleted() // Retrieve success values. val successValue1 = result.getOrDefault("default string") val successValue2 = result.getOrThrow() // Retrieve failure reason. val failureReason = result.getErrorOrNull() // Generic type check. when (result) { // We typically ignore the pending case. result.isFailure() -> { /* Do something with result.getErrorOrNull() */ } result.isSuccess() -> { /* Do something with result.getOrDefault() or result.getOrThrow() */ } // The 'else' case is possible, and often not implemented. } ``` After: ```kotlin val result: AsyncResult<String> = fetchOrReceiveSomeResult() // Construction. val pending = AsyncResult.Pending<String>() val failure = AsyncResult.Failure<String>(IllegalStateException("Failure reason")) val success = AsyncResult.Success("success value") // Check type. val isPending = result is AsyncResult.Pending val isFailure = result is AsyncResult.Failure val isSuccess = result is AsyncResult.Success val isCompleted1 = result !is AsyncResult.Pending val isCompleted2 = result is AsyncResult.Failure || result is AsyncResult.Success // Retrieve success values. val success1 = success.value // Type is 'String' val success2 = (result as? AsyncResult.Success)?.value // Type is 'String?' conditioned on if the result is a success. // Retrieve failure reason. val error1 = failure.error // Type is 'Throwable' val error2 = (result as? AsyncResult.Failure)?.error // Type is 'Throwable?' conditioned on if the result is a failure. // Generic type check. when (result) { is AsyncResult.Pending { /* Do something in the pending case, such as showing a loading indicator. */ } is AsyncResult.Failure { /* Do something with result.error which is never null. */ } is AsyncResult.Success { /* Do something with result.value which is always defined. */ } // 'else' can't exist since all cases are exhausted, and Kotlin will eventually produce an error if sealed class 'when's aren't exhaustive. } // Note 'getOrThrow' is no longer needed since 'value' can only be retrieved if the result is confirmed to be a success. // Similarly, 'getErrorOrNull' isn't needed since the error can similarly only be extracted from a Failure result. // 'getOrDefault' is still possible using patterns as so (where the second is preferred to ensure all cases are considered): val defaultExample1 = if (result is AsyncResult.Success) result.value else default val defaultExample2 = when (result) { // Or, split out 'failure' to log an error message. is AsyncResult.Pending, is AsyncResult.Failure -> default is AsyncResult.Success -> result.value } // Note that result transformations haven't changed at all. ``` The main advantages of the new approach: - All type cases must always be considered when using 'when' with the result (forcing consideration of both error and pending cases--this is useful since we often ignore pending cases today despite them existing and occurring) - There's no longer any question about the presence of success values or failure error throwables (they are always present if the corresponding result is the correct type) which results in safer, more readable, and shorter code - The implementation properly allows for null success values (see 'rationale' section below) The only disadvantage is that some generic cases will conflate ``AsyncResult``'s sub-types, such as this case: ```kotlin val results1 = mutableListOf(AsyncResult.Pending()) // results1 is of type MutableList<AsyncResult.Pending>, but this can be fixed by being explicit: val results2 = mutableListOf<AsyncResult<String>>(AsyncResult.Pending()) // Or: val results3: MutableList<AsyncResult<String>> = mutableListOf(AsyncResult.Pending()) ``` ### Rationale for why this PR is needed & connection to algebraic expressions #4239 introduces changes to ``QuestionAssessmentProgressController`` and ``ExplorationProgressController`` such that data providers with null payloads are transformed into others. However, that caused an issue when transforming results since 'null' can mean either the absence of 'success' (i.e. a failure) or a successful null value, and ``AsyncResult`` couldn't tell the difference between these without leveraging a sealed class. See #4239 for why it's necessary for the broader algebraic expressions work (which is the same justification as why this PR is). This PR exists to make #4239 smaller since it brought in some additional refactoring & cleanup beyond the minimal migration to a sealed class. Furthermore, this included in the algebraic project chain for the same reason as #4239: to simplify merging and reviews. ### Tests migration and ``DataProviderMonitorFactory`` Rather than updating all tests which verified ``DataProvider`` results to use the new ``AsyncResultSubject``, it seemed nicer to migrate them over to ``DataProviderMonitorFactory`` (which effectively fully fixed #3813). This resulted in a nice simplification in a lot of test suites by eliminating Mockito usage entirely (except in the monitor's implementation). I also replaced all ``toLiveData``-esque calls to use ``DataProvider``s, instead (which required changing a bunch of controllers over to using ``DataProvider``s instead of ``LiveData`` (which is the preferred approach since ``LiveData`` should never be used in the domain layer to ensure good threading cohesion). ### Test exemptions & changes Note that there's a new ``AsyncResultSubject`` being introduced here to simplify testing (which consequently fixes #92), but tests aren't added for this subject for similar reasons to past algebraic expression PRs. #4236 was filed to track adding tests for this subject in the future. All existing cases that weren't migrated as part of the monitor factory refactor were moved over to using this new subject (except those explicitly testing ``AsyncResult``). A bunch of tests have been removed since they are no longer possible due to compile-time guarantees about the value of ``AsyncResult`` in different scenarios. ``StateFragmentTest`` was updated to account for the content description for incorrect answers to not be computed correctly (see below section for details). ### Changes to ``StateRecyclerViewAssembler`` & content descriptions There was a bug in ``StateRecyclerViewAssembler`` where an answer wouldn't be indicated as incorrect correctly in explorations. I fixed this by introducing a more explicit 'correct_answer' signal in the answer response proto. This fix was needed since one of ``StateFragmentTest``'s tests started failing when another issue was fixed in ``ExplorationProgressControllerTest``: in an old PR, one of the notifications for the current state ``DataProvider`` was dropped mid-answer submission. Restoring this notification triggered the test to correctly fail (due to it actually hitting the bug in the assembler; before, the content description wasn't being correctly defined in that test and it was falsely passing). The fix to the assmbler caused the test to start passing (but another to fail due to it not being supported on Robolectric; it's been subsequently ignored like others of its kind). The new fix is intentionally designed to be robust against cases when an answer doesn't have a correct label (since that was assumed to be present before). ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only N/A -- This should be a non-side effect change. Despite the UI changes, it shouldn't actually change user behaviors (possibly with the exception of answer notification being a bit faster which led to the UI test fix that was explained above) with one primary exception: this fixes the content description computation for incorrect answers. **Reviewers:** please let me know if you'd like this demonstrated, and how. I also didn't explicitly verify that affected shared tests pass on Espresso, but I highly doubt they wouldn't (or at least, would change their current pass/fail status) given that their Robolectric counterparts pass and this is a non-functional change. I did not verify StateFragmentTest on Espresso since it's currently broken (see #4238). This is fixed in the next PR where the test suite is verified to pass. Commit history: * Alphabetize test exemptions. * Fix typo & add regex check. The new regex check makes it so that all parameterized testing can be more easily tracked by the Android TL. * Add missing KDocs. * Post-merge cleanups. Also, fix text file exemption ordering. * Add new test for negation with math symbol. * Post-merge fixes. * Add KDocs. Also, add new regex exemption for new parameterized tests in this branch. * Refactor & simplify real ext impl. Also, fix/clarify some KDocs. * Lint fixes. * Simplify operation list converter a lot. This inlines three recursive operations to be done during the actual computation to simplify the overall converter complexity (and to make determining the test matrix easier). * Prepare for new tests. * Remove the ComparableOperationList wrapper. * Change parameterized method delimiter. * Use utility directly in test. * Post-merge fixes. This adjusts for the removal of ComparableOperationList (i.e. no wrapper proto). * Add first round of tests. This includes fixes to the converter itself as it wasn't distributing both product inversions and negation correctly in several cases. Tests should now be covering these cases. * Finish initial test suite. Still needs to be cleaned up, but after converter refactoring attempts. * Simplify operation sorting comparators. * Remove old tests. * Add remaining missing tests. * KDocs & test exemption. * Renames & lint fixes. * Post-merge fixes. * Add tests. * KDocs + exemptions. Also, clean up polynomial sorting. * Lint fixes. * Post-merge fixes. Also, mark methods/classes that need tests. * Add extension tests. * Add classifier tests. * Use more intentional epsilons for float comparing. * Treat en-dash as a subtraction symbol. * Add explicit platform selection for paramerized. This adds explicit platform selection support rather than it being automatic based on deps. While less flexible for shared tests, this offers better control for tests that don't want to to use Robolectric for local tests. This also adds a JUnit-only test runner, and updates MathTokenizerTest to use it (which led to an almost 40x decrease in runtime). * Exemption fixes. Also, fix name for the AndroidJUnit4 runner. * Remove failing test. * Fix unary expression precedence. Also, use ParameterizedJunitTestRunner for MathExpressionParserTest. * Fixes & add more test cases. * Post-merge fixes & test changes. Also, update RealExtensionsTest to use the faster JUnit runner. * Use utility directly in LaTeX tests. * Post-merge fixes. Also, update ExpressionToComparableOperationConverterTest to use the fast JUnit-only runner. * Post-merge fixes. Also, update PolynomialExtensionsTest to use fast JUnit-only runner. * Post-merge fixes. Also, update float interval per new tests. * Lint & other check fixes. * Replace deprecated term. * Post-merge fixes. * Add full test suites for alg exp classifiers. * Lint & static check fixes. * Fix test on Gradle. * Fix test for Gradle. * Add tests for math equations. And, post-merge fixes. * Static check & lint fixes. * Post-merge fixes. Verified CI checks & all unit tests are passing. * Split up tests. Also, adds dedicated BUILD.bazel file for new test. * Add missing test in Bazel, and fix it. * Correct order for genrule. * Add full test suite. * Clean up + KDocs + exemption. * Lint fixes. * Post-merge fix. * Cache KotliTeX renders. Directly rendering LaTeX through KotliTeX is way too slow, so this introduces a custom flow through Glide that computes a PNG for the LaTeX on a background thread and then caches it as part of Glide's cache to speed up re-renders of the LaTeX. We may need to manage/prune the cache over time, but for now we'll just rely on Glide's internal behaviors. This also documents some new tests that should be added, but it's not comprehensive. * Add tests, docs, and exemptions. * Update to fixed version of KotliTeX. The newer version correctly computes the bounds for rendered LaTeX. * Lint fixes. * Add new dependency licenses. This isn't done yet (some of the licenses still need to be fixed). * Fix license links. Note that the kxml one was tricky since its Maven entry says it's licensed under BSD and CC0 1.0, and its SourceForge link says the same plus LGPL 2.0. However, the source code and GitHub version of the project license it under MIT, and this seems to predate the others so it seems like the most correct license to use in this case and the one that we're using to represent the dependency. * Fix Gradle build. This uses a version of KotliTeX that builds correctly on Jitpack for Gradle, and fixes the StaticLayout creation to use an alignment constant that builds on Gradle (I'm not sure why there's a difference here between Gradle & Bazel, but the previous constant isn't part of the enum per official Android docs). * Create the math drawable synchronously. This requires exposing new injectors broadly in the app since the math model loader doesn't have access to the dependency injection graph directly. * Remove new deps from Maven list. They were incorrectly pulled in by KotliTeX. * Add argument partitioning. This fixes cases where argument calls may be very large and fail to execute due to exceeding system limitations. * Make allowance for empty cases to fix tests. These tests correspond to real scenarios. * Lint fixes. * Address reviewer comment. Clarifies the documentation in the test runner around parameter injection. * Fix broken build. * Fix broken build post-merge. * Fix broken post-merge classifier. * Address reviewer comment. * Post-merge build fixes. * Post-merge build fixes for new classifiers. * Post-merge build fixes. * Correct reference document link. * Ensure LaTeX isn't stretched or cut-off. The comments in-code have much more specifics on the approach. * Add and fix missing test (was broken on Gradle). * Refactor AsyncResult into a sealed class. This also introduces an AsyncResultSubject, and more or less fully fixes issue #3813 in all tests. This is a cherry-pick from the fix-progress-controller-deadlock branch since it ended up being quite large (it made more sense to split it into a pre-requisite PR). Conflicts: app/src/main/java/org/oppia/android/app/player/state/testing/StateFragmentTestActivityPresenter.kt domain/src/main/java/org/oppia/android/domain/exploration/ExplorationProgressController.kt domain/src/main/java/org/oppia/android/domain/question/QuestionAssessmentProgressController.kt domain/src/test/java/org/oppia/android/domain/exploration/lightweightcheckpointing/BUILD.bazel testing/src/main/java/org/oppia/android/testing/data/DataProviderTestMonitor.kt utility/src/main/java/org/oppia/android/util/data/DataProviders.kt * Post-merge fixes and updates for consistency. * Post-merge fixes. * TODO has been addressed. * Fix documentation & add tests. * Lint fixes. * Fix gradle tests. I've verified in this commit that all Gradle tests build & run locally (at least on Robolectric). * Post-merge fix. * More post-merge fixes. * Fix TODO comment. * Post-merge lint fixes. * Post-merge fix. * Fix exploration routing issue. The underlying problem was that the PR inadvertently changed the behavior of comparing two results wherein results of different times would be considered different and be re-delivered (which happened to break exploration routing, and likely a variety of other places). This introduces a new method for checking equivelance rather than confusingly assuming that users of AsyncResult don't care about the result's time during comparison checking. New tests have been added to verify the functionality works as expected (and fails when expected), and I manually verified that the exploration routing issue was fixed. More details on the specific user-facing issue will be added to the PR as a comment. * Update KotliTeX version. This version doesn't have debug drawing enabled.

…migrating progress controllers over to a command-queue structure (#4239) ## Explanation Fix #3622 Fix #4238 Fix #3861 Fix part of #4044 (see below for how this relates to the broader math expressions project) This PR fixes a deadlock that's possible in both ``QuestionAssessmentProgressController`` and ``ExplorationProgressController`` wherein the internal locks used to synchronize session state can conflate with the locks of the controllers' ``HintHandler``s (which calls back into the controller within the lock). While this seems really difficult to hit when playing a training session or exploration, ``StateFragmentTest`` seems to very consistently hit the deadlock after only 10-15 tests (so it essentially can't be run without the fixes introduced here). While I could've tried rearranging the locks to be more cooperative, it's actually difficult to guarantee that deadlocks won't happen without rebuilding the internal locking mechanism for controller data. I chose the latter by introducing a command queue-esque pattern using a Kotlin actor with ``StateFlow``s for pushing state back to callers (see https://kotlinlang.org/docs/shared-mutable-state-and-concurrency.html; actors are a powerful way to leverage the lightweight nature of coroutines to manage cross-thread state machines). To simplify the implementation, this PR also introduces conversion methods from ``StateFlow``s to ``DataProvider``s (I could see us leveraging these flows more in the future, maybe as a long-term replacement to ``DataProvider``s with some refinement). One side effect of using a command queue to manage what is effectively a session state is that state can now fairly easily leak between sessions. Efforts have been taken to minimize the chance of this happening (mainly ignoring events that arrive from a previously ended session), but a better long-term solution would be to create new session controllers for each session (which should be something possible with custom scopes once we integrate Dagger Hilt in #1720). Due to a general push toward using ``DataProvider``s much more heavily in these controllers rather than ``LiveData``, ``AsyncResult`` needed to be refactored in order for some of the internal transformed chains in the controllers to work correctly (which is where #4237 comes in). This also permanently fixes #3861 and other potential errors that could happen by ensuring that the provider state can continue even if internal errors are encountered rather than entering a permanently broken state (#4230 mitigates the immediate repro steps from the bug by fixing the underlying issue that caused the exception, whereas this PR focuses on preventing those scenarios from causing the bug in question). ### Some explanations for the threading approach I was originally going to make hint handler also a command queue, but it became a bit complicated. To keep things simpler, I removed the threading restricting on it so that it can only be accessed on a single thread (similarly to the internal progress objects used by both controllers) which helps avoid needing a lock without worrying about data races. Note that this command queue style implementation of ``HintHandler`` is available to view in the past commits in this PR if it's interesting for future development (since it's a simpler command queue than the ones used by the controllers due to having an inherently simpler internal state machine). This PR demonstrates a general tendency to move the app toward a 'notify-and-pull' model (e.g. ``DataProvider``s) vs. a push-subscribe pattern. The former is easier to synchronize, parallelize, ensure lifecycle safety, and is often more performant since operations only need to execute when their results are needed. Note that both controllers' internal computation behaviors are completely different with this new approach. Previously, the internal progress state machine would only be fully processed when the current ephemeral state/question was retrieved from the UI layer (i.e. lazily), but now it's computed shortly after the change (i.e. eagerly) since that works a bit better with the ``Flow``s being used for internal cross-thread communication. This technically means the controllers are slightly less efficient since execution may happen in cases when the state/question is no longer being viewed by the user, but it should realistically make a small difference since we have to thread hop for the command queue, anyway, and computing the ephemeral state/question should be cheap. This has the benefit of a potentially snappier load experience in the frontend, though, since the ephemeral state/question will be available sooner. All operations in the controllers now execute their operations regardless of whether they're observed (which should better match callers' expectations). Note that the internal state machine complexity can probably be drastically simplified now that the command queue can also act as a state machine, but I didn't want to risk regressions by removing it (I was trying to keep the logical flow as similar as possible to the existing behavior). ### Updates to dependencies The coroutines core dependency needed to be updated since the older version didn't include ``StateFlow`` which was necessary for providing a sync-back mechanism for the controller command queues. ### Stability checks I verified that both ``QuestionPlayerActivity`` and ``StateFragmentTest`` pass on Espresso (where the latter deadlocked prior to this PR); see the screenshots below. Further, I tested ``ExplorationProgressControllerTest`` and ``QuestionAssessmentProgressControllerTest`` 100x times to verify both are not flaky (since some flakes were discovered during development). Both test suites passed all 100 runs. ### Bazel app module tests This PR introduces support for dedicated app module tests in Bazel (rather than bundling them as part of the broader list of app module tests). This required a few changes in .bzl and .bazel files, but the change is otherwise straightforward and extremely beneficial for team members that rely on the Android Studio with Bazel plugin for regular development. ### Miscellaneous I tried introducing a Bazel emulator test while developing this (as part of #59) but it ran into a desugar issue with Mockito and Robolectric (due to them both being pulled into the test binary that's being built to send to the device) which AGP apparently works around. See bazelbuild/bazel#13684 for a Bazel issue tracking the same issue. This fixes #3622 since it eliminates the lock entirely in favor of using actors for safe cross-thread communication. The ``MediatorLiveData`` specific part of this issue's TODO comment isn't relevant anymore, either, since we replaced ``MediatorLiveData`` with a custom notifying ``LiveData`` within ``DataProviders`` a while back. This fixes #4238 fully by ensuring any exceptions caused within the state flow of an exploration or questions session are captured and logged, but don't actually break the internal state machine of the session. This issue was discovered during the development of #2173 due to an exception being thrown in the classifier. While that issue was fixed, a test couldn't actually be added for the #4238 fix since it's difficult to intentionally trigger exceptions during answer classification. ### A note on checkpoints in ``ExplorationProgressController`` The checkpointing behavior in ``ExplorationProgressController`` became more complex as a result of these changes. In particular, checkpointing behaves as follows: - Something changes in the play session - A checkpoint recomputation is requested (which results in a new checkpoint being computed and then saved to disk) - The checkpoint is 'processed' by updating the play state of the exploration (i.e. in ``StoryProgressController``) - The ephemeral state is recomputed since it contains the checkpoint and the checkpoint has changed Each of these more or less require a command queue "hop" which means other operations can occur between them. As a result, #3467 can still occur since an exploration can be finished before the process happens (though the behavior has changed: the progress simply won't be processed which means an exploration might not be indicated as played, but the checkpointing should almost always be saved before the exploration session end is processed). Fixing this isn't straightforward since the exploration's play state can't be changed until the checkpoint is confirmed as saved which either requires updating ``StoryProgressController`` to also account for the presence of checkpoints, or to use more flows internally to combine the save-and-process operation into one in a way that doesn't lock up ``ExplorationProgressController`` (I'm currently not exactly sure how one would do this, but I haven't thought about it in depth yet). ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only This PR does not intentionally change and user-perceivable behaviors in the questions or state players except potentially fixing deadlocks/ANRs that could occur while playing. There could be regressions in the players, and they may potentially be more performant after this change, but neither can easily be demonstrated via videos or screenshots. There are no accessibility behavior changes to demonstrate, either. However, two key tests have been run and verified as passing on Espresso to demonstrate that the deadlock has been fixed. Specifically: | QuestionPlayerActivityTest | StateFragmentTest | |------|------| | ![deadlock_fix_pr_question_player_activity_test_passing](https://user-images.githubusercontent.com/12983742/158529405-6e19a602-19b8-48ab-814b-980831c57cb4.png) | ![deadlock_fix_pr_state_fragment_test_passing](https://user-images.githubusercontent.com/12983742/158529650-cc6ac45f-dbee-46b6-a732-d2e4a1a1fbcb.png) | Commit history: * Change parameterized method delimiter. * Use utility directly in test. * Post-merge fixes. This adjusts for the removal of ComparableOperationList (i.e. no wrapper proto). * Add first round of tests. This includes fixes to the converter itself as it wasn't distributing both product inversions and negation correctly in several cases. Tests should now be covering these cases. * Finish initial test suite. Still needs to be cleaned up, but after converter refactoring attempts. * Simplify operation sorting comparators. * Remove old tests. * Add remaining missing tests. * KDocs & test exemption. * Renames & lint fixes. * Post-merge fixes. * Add tests. * KDocs + exemptions. Also, clean up polynomial sorting. * Lint fixes. * Post-merge fixes. Also, mark methods/classes that need tests. * Add extension tests. * Add classifier tests. * Use more intentional epsilons for float comparing. * Treat en-dash as a subtraction symbol. * Add explicit platform selection for paramerized. This adds explicit platform selection support rather than it being automatic based on deps. While less flexible for shared tests, this offers better control for tests that don't want to to use Robolectric for local tests. This also adds a JUnit-only test runner, and updates MathTokenizerTest to use it (which led to an almost 40x decrease in runtime). * Exemption fixes. Also, fix name for the AndroidJUnit4 runner. * Remove failing test. * Fix unary expression precedence. Also, use ParameterizedJunitTestRunner for MathExpressionParserTest. * Fixes & add more test cases. * Post-merge fixes & test changes. Also, update RealExtensionsTest to use the faster JUnit runner. * Use utility directly in LaTeX tests. * Post-merge fixes. Also, update ExpressionToComparableOperationConverterTest to use the fast JUnit-only runner. * Post-merge fixes. Also, update PolynomialExtensionsTest to use fast JUnit-only runner. * Post-merge fixes. Also, update float interval per new tests. * Lint & other check fixes. * Replace deprecated term. * Post-merge fixes. * Add full test suites for alg exp classifiers. * Lint & static check fixes. * Fix test on Gradle. * Fix test for Gradle. * Add tests for math equations. And, post-merge fixes. * Static check & lint fixes. * Post-merge fixes. Verified CI checks & all unit tests are passing. * Split up tests. Also, adds dedicated BUILD.bazel file for new test. * Add missing test in Bazel, and fix it. * Correct order for genrule. * Add full test suite. * Clean up + KDocs + exemption. * Lint fixes. * Post-merge fix. * Cache KotliTeX renders. Directly rendering LaTeX through KotliTeX is way too slow, so this introduces a custom flow through Glide that computes a PNG for the LaTeX on a background thread and then caches it as part of Glide's cache to speed up re-renders of the LaTeX. We may need to manage/prune the cache over time, but for now we'll just rely on Glide's internal behaviors. This also documents some new tests that should be added, but it's not comprehensive. * Add tests, docs, and exemptions. * Update to fixed version of KotliTeX. The newer version correctly computes the bounds for rendered LaTeX. * Lint fixes. * Add new dependency licenses. This isn't done yet (some of the licenses still need to be fixed). * Fix license links. Note that the kxml one was tricky since its Maven entry says it's licensed under BSD and CC0 1.0, and its SourceForge link says the same plus LGPL 2.0. However, the source code and GitHub version of the project license it under MIT, and this seems to predate the others so it seems like the most correct license to use in this case and the one that we're using to represent the dependency. * Fix Gradle build. This uses a version of KotliTeX that builds correctly on Jitpack for Gradle, and fixes the StaticLayout creation to use an alignment constant that builds on Gradle (I'm not sure why there's a difference here between Gradle & Bazel, but the previous constant isn't part of the enum per official Android docs). * Create the math drawable synchronously. This requires exposing new injectors broadly in the app since the math model loader doesn't have access to the dependency injection graph directly. * Remove new deps from Maven list. They were incorrectly pulled in by KotliTeX. * Add argument partitioning. This fixes cases where argument calls may be very large and fail to execute due to exceeding system limitations. * Make allowance for empty cases to fix tests. These tests correspond to real scenarios. * Lint fixes. * Address reviewer comment. Clarifies the documentation in the test runner around parameter injection. * Fix broken build. * Fix broken build post-merge. * Fix broken post-merge classifier. * Address reviewer comment. * Post-merge build fixes. * Post-merge build fixes for new classifiers. * Post-merge build fixes. * Correct reference document link. * Ensure LaTeX isn't stretched or cut-off. The comments in-code have much more specifics on the approach. * Add and fix missing test (was broken on Gradle). * Update to newer version of Kotlin coroutines. This version includes StateFlow which will be a really useful mechanism for helping to avoid using critical sections. * First attempt to fix deadlock. This uses StateFlows and an actor to avoid the deadlock. This is missing necessary hint changes that are coming in a later commit. Tests currently fail, and questions haven't yet been migrated (and won't until the fixes are verified). * Attempt to make hint handler multi-threadable. This is a midway solution that's being committed for posterity, but will be reverted in favor of a solution that forces hint handler to be unsafe across multiple threads (since it's much simpler, and works given that all users of it now synchronize state management using an actor). * Finish fixing state player. This includes a fix for accessibility string handling (since the new flow triggers one of these to fail). It also adds a Bazel file for StateFragmentTest (I spent some time trying to get Espresso tests working with Bazel but ran into a compatibility issue). StateFragmentTest has been verified to pass on Robolectric and Espresso with this change. This sets up the project for fixing questions in the next commit. * First pass on migrating question controller. This also includes a migration for exploration & question domain tests to use the test monitor utility. The question tests currently fail since there's a bug in AsyncResult where it won't support null values during transformations. * Refactor AsyncResult into a sealed class. This also introduces an AsyncResultSubject, and more or less fully fixes issue #3813 in all tests. * Refactor AsyncResult into a sealed class. This also introduces an AsyncResultSubject, and more or less fully fixes issue #3813 in all tests. This is a cherry-pick from the fix-progress-controller-deadlock branch since it ended up being quite large (it made more sense to split it into a pre-requisite PR). Conflicts: app/src/main/java/org/oppia/android/app/player/state/testing/StateFragmentTestActivityPresenter.kt domain/src/main/java/org/oppia/android/domain/exploration/ExplorationProgressController.kt domain/src/main/java/org/oppia/android/domain/question/QuestionAssessmentProgressController.kt domain/src/test/java/org/oppia/android/domain/exploration/lightweightcheckpointing/BUILD.bazel testing/src/main/java/org/oppia/android/testing/data/DataProviderTestMonitor.kt utility/src/main/java/org/oppia/android/util/data/DataProviders.kt * Post-merge fixes and updates for consistency. * Post-merge fixes. * TODO has been addressed. * Fix documentation & add tests. * Lint fixes. * Lint & post-merge fixes. Questions-related tests still fail and need to be fixed now that AsyncResult has been refactored to support null values. * Post-merge test fixes. The core affected UI/domain tests have now been verified as working on Robolectric. The full test suite is next. * Add documentation & tests. Also, fix a bug in the automatic StateFlow DataProviders wherein they wouldn't notify correctly on changes. This led to some simplifications in the exploration & question progress controllers. * Lint fixes. * Fix gradle tests. I've verified in this commit that all Gradle tests build & run locally (at least on Robolectric). * Fix Proguard build. This required bringing kotlinx-coroutines-android up-to-date with kotlinx-coroutines-core (since that was updated on this branch). New but reasonable Proguard warning exemptions also needed to be added. * Post-merge fix. * More post-merge fixes. * Fix TODO comment. * Post-merge lint fixes. * Post-merge fix. * Fix exploration routing issue. The underlying problem was that the PR inadvertently changed the behavior of comparing two results wherein results of different times would be considered different and be re-delivered (which happened to break exploration routing, and likely a variety of other places). This introduces a new method for checking equivelance rather than confusingly assuming that users of AsyncResult don't care about the result's time during comparison checking. New tests have been added to verify the functionality works as expected (and fails when expected), and I manually verified that the exploration routing issue was fixed. More details on the specific user-facing issue will be added to the PR as a comment. * Post-merge fixes. * Update KotliTeX version. This version doesn't have debug drawing enabled. * Fix lifecycle breakage. I noticed downstream that quickly navigating back and forth to enter & exit an exploration can cause the progress controller to get into a bad state that either leads to a broken experience (stacked explorations or a blank state), or a crash. The issue was coming from DataProviders being shared across sessions even though the underlying state was different. This change ensures that providers stay completely independent, similar to the core session state. * Update play session controllers. This ensures that the command queue itself is fully isolated to avoid delayed events potentially leaking across sessions (now that the session ID barrier has been removed).

BenHenning added this to the Alpha MR4 milestone Mar 8, 2022

BenHenning self-assigned this Mar 8, 2022

BenHenning closed this as completed in #4239 Mar 27, 2022

BenHenning added this to [Team] Developer Workflow & Infrastructure - Android Sep 15, 2022

BenHenning moved this to Needs Triage in [Team] Developer Workflow & Infrastructure - Android Sep 15, 2022

BenHenning added the Z-ibt Temporary label for Ben to keep track of issues he's triaged. label Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StateFragmentTest is flaky due to ExplorationProgressController deadlock #4238

StateFragmentTest is flaky due to ExplorationProgressController deadlock #4238

BenHenning commented Mar 8, 2022

BenHenning commented Mar 8, 2022

StateFragmentTest is flaky due to ExplorationProgressController deadlock #4238

StateFragmentTest is flaky due to ExplorationProgressController deadlock #4238

Comments

BenHenning commented Mar 8, 2022

BenHenning commented Mar 8, 2022