Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix part of #3728, #3729, #3730: Introduce infrastructure to support app & content string translations #3794

Merged
merged 59 commits into from
Sep 27, 2021

Conversation

BenHenning
Copy link
Member

@BenHenning BenHenning commented Sep 15, 2021

Explanation

Fixes part of #3728
Fixes part of #3729
Fixes part of #3730

Overview

This PR introduces the core support necessary for localization (particularly, automatic language selection), as described by this design document (note that this document is now a bit outdated). In particular:

  • It introduces the proto structures for representing what languages & regions the app supports (to represent locales)
  • It introduces actual configurations to specify the supported languages & regions (Bazel builds only--see 'configuration' section below)
  • It introduces a new OppiaLocale & subclasses to replace uses of Android's Locale class (regex patterns will be coming in later PRs to enforce this requirement) to ensure properly localization choices are made throughout the codebase, and to provide consistent locale-esque functionality (such as for case conversion & time/date formatting)
  • It introduces two new controllers (LocaleController & TranslationController) to support various future operations (see later PRs) that are necessary for proper localization support

Special high-level considerations

It's worth noting two things:

  1. Some of the conceptual mechanisms here are especially complex
  2. This PR is quite large, especially from testing changes

Regarding (1), much of the complexity comes from a number of factors:

  • There are different, inconsistent international standards used to represent both languages & regions (ISO 639, ISO 3166, IETF BCP 47, among others), multiple of which Android's Locale implementation (derived from Java's) supports
  • Android's own region selection doesn't follow any particular standard (exactly)
  • There's no way of knowing which languages a particular version of Android actually supports since this can be SDK, OEM, and region specific
  • Java has a variety of design issues with case conversion that are being partly fixed in Kotlin 1.5 (which we don't yet support) (mainly that Java chooses the default Locale for case conversion which can result in inconsistencies). This, coupled with the need to sometimes bidirectionally wrap strings programmatically has introduced need for more complex & comprehensive locale wrapper classes.
  • The app needs to support custom languages that the Android system may not ever support (such as Hinglish where macaronic languages aren't yet recognized by any international language standards)
  • The app is actually managing three different languages at any given time for app strings, content strings, and audio translations

Regarding (2), the importance of localization as critical infrastructure has driven an especially high standard for testing in this PR, including introducing a custom Robolectric shadow & multiple test fakes, each with their own test suites. While this is so thorough, it's meant to ensure a strong regression barrier for the critical infrastructure being introduced in this PR, and to lay a good foundation for future code refactoring as the codebase continues to evolve & scale.

High-level design considerations

This is slightly repeating some points in the design document, but for the purpose of review these may be useful:

  • Three different locales are introduced for different uses:
    • ContentLocale: for audio & written translations (generally all we care about is what language to choose; this may evolve in the future)
    • MachineLocale: for manipulating or creating machine-readable strings (such as for logs). Most case conversion will use this locale moving forward.
    • DisplayLocale: for retrieving & manipulating strings that will eventually be displayed to users (this one actually follows whatever the current Locale defined is)
  • Controllers are used to force usage of certain locales
  • DataProviders are the key mechanism for computing locales since it allows monitoring changes to the default Android Locale instance, and to enforce app-configured language constraints (see 'configuration' section below)

Language configuration

This PR is introducing a new pattern under the config/ folder: two textproto files which clearly defined the supported languages & regions. Some things to note about this:

  • This is using the same textproto->pb conversion as the scripts & domain assets pipeline and, as such, it's only supported on Bazel (Gradle gracefully defaults to no languages which results in the system language always being chosen in downstream PRs); it's for this reason we plan to ship Alpha MR3 with a Bazel build instead of Gradle
  • The language list does not restrict access for app strings (but will eventually for lesson & audio strings). The region list is currently informational, but not enforced (and isn't yet planned to be; it's here for conceptual interoperability with Android's Locale class but it's not yet clear if we need it long-term).
  • The new config/ subpackage requires a separate manifest since it's exporting assets. This is a pattern we'll see more of once we move off of Gradle (the idea of having package-specific manifest files).

Semi-related, but two configurations that are hardcoded:

  • English for language defaults if everything else fails (since we generally know that app strings & content will be available in English)
  • US locale for machine strings (since we mainly use US/English for development)

Neither of these should actually affect users in practice, but it's possible to if things go wrong (& and a trivial default is needed).

Notes on testing

  • DisplayLocaleImplTest makes use of test-only resources which means it can't run in Gradle (& actually needed to be omitted-see 'Gradle changes' section below). This is the pattern we want to use moving forward for test-only resources (as it's one of the major wins of using Bazel)
  • A custom shadow was needed for Android's BidiFormatter to test our new wrapper & fake for it (see the KDoc for the shadow for details on why we didn't just broadly use the shadow)
  • AssetRepository was refactored to an interface + module so that we could introduce a no-op version for testing LanguageConfigRetriever when no textproto is present (the Gradle case). This led to updating most test suites (nearly all app module tests) to properly bind AssetRepository in test Dagger graphs.
    • Note that one of the commits in this PR includes a full TestAssetRepository implementation, but this was reverted to be a no-op implementation to reduce the scope of the PR a bit. We can reference that commit & bring the changes back if they're desired in the future.
  • FakeOppiaClock was updated to force the Locale to UTC. I'm not sure why this didn't cause issues before, but Gradle sets the timezone to the user's local whereas Gradle sets it to UTC. Forcing to UTC ensure consistency between the two platforms & across all tests (& generally just seems more correct).

Version/dependency & Gradle changes

  • The rules_proto SHA hash was wrong (I noticed this in passing), so I fixed it. This should have no impact on the build other than removing a warning.
  • The upgrade to javalite 3.17.3 in Gradle builds was to bring consistency with Bazel (without this, some of the new code making use of has*() in generated protos introduces compile-time errors in the Gradle build)
  • Upgrading the javalite version led to a bunch of reworking of how the model module works (overall a simplification), but javalite needed to be added more broadly as a dependency (now that we're no longer compiling it into the model library via api)
  • A hacky workaround was needed to exclude files that depend on test-only resources (see 'testing' section above) or the new textproto config files (see 'configuration' section above) to exclude these tests from the build (in domain/build.gradle)

Codeowner changes

  • The new textproto configuration files required defined codeowners (I picked myself since we should ensure the app is actually ready for a new language to be introduced)
  • The new domain test resources strings.xml file required a codeowner. I just picked myself for now (since I have global domain layer code ownership at the moment), but this can easily be delegated to one of the subteams based on how we divide up the domain layer moving forward

Static check exemptions

Test files:

  • AssetRepositoryImpl: this is retaining the existing 'AssetRepository' exemption (though that's not being removed since it's now an interface and is legitimately exempted). This wasn't resolved in this PR to keep the PR slightly smaller since a lot of new tests would need to be introduced to properly test AssetRepository.
  • OppiaBidiFormatter & OppiaLocale: interfaces that shouldn't require test files in this case

Regex checks: all of these are legitimate exemptions since the respective classes are hiding the patterns that are being prohibited (or testing those classes).

Other miscellaneous changes

  • Android's BidiWrapper was wrapped in a custom Oppia-specific implementation with regex checks to force usage of the wrapper so that tests can verify whether strings are actually wrapped (without relying on wrapping occurring or looking for implementation-specific mark characters)
    • Note that the complexity around properly wrapping (& not extra-wrapping) strings inspired different versions of String.format() with & without wrapping. See OppiaLocale.DisplayLocale for these specific methods & their KDocs for more context. Note also that an intentional design choice was made to only allow strings in resource string formatting moving forward for strings that need to support bidirectional wrapping (this generally seems more correct).
  • OppiaClock had some minor code consolidation to simplify the prod/test implementations
  • AAB targets were updated to have the "manual" tag so that running bazel build //... won't include those targets (building AABs is expensive, especially with Proguard running so this helps with local development). CI runs won't be affected since they directly reference the AAB targets (which is what "manual" does--it avoids the target being included in wildcard patterns).
  • DateTimeUtil incorrectly computes time-of-day, but we haven't noticed it since it defaults to evening (& it's the evening case that's wrong). This has been fixed in MachineLocaleImpl & will replace DateTimeUtil in a future PR.
  • A new test utility was introduced (DataProviderMonitor) to make testing data providers easier. This is being used a bit in this & later PRs, but we'll probably want to eventually migrate all DataProvider tests over to the monitor in the long-term. It combines helpers from tests that have iterated on data provider testing over the last couple of years into a single general-purpose utility.

Future changes

This PR is primarily focused on introducing the necessary support for automatic language selection for app strings, and to lay the groundwork for content & audio strings. Future PRs will introduce much of the remaining work for automatic selection (see the constituent issues that this PR is 'partly' fixing to follow along with that work). Longer-term, a full language selection UI will be introduced in the app. It's not expected that this domain layer will need to change much for that work to be completed.

Finally, while this & future PRs are introducing a lot of regex pattterns to enforce locale-specific best practices, additional custom CI checks are coming (see design doc) in coming weeks & months to ensure that the app strings themselves are properly set up for safe localization & are translated correctly (at least, for the things that we can easily verify as correct using automation).

Essential Checklist

  • The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
  • Any changes to scripts/assets files have their rationale included in the PR explanation.
  • The PR follows the style guide.
  • The PR does not contain any unnecessary code changes from Android Studio (reference).
  • The PR is made from a branch that's not called "develop" and is up-to-date with "develop".
  • The PR is assigned to the appropriate reviewers (reference).

For UI-specific PRs only

N/A

BenHenning and others added 30 commits August 31, 2021 23:33
There are a lot of details to cover here--see the PR for the complete
context.
- Add missing codeowner
- Add support for configuring base branch reference
- Update CI for dev/alpha AAB builds to use 'develop' since there's no
  origin configured by default in the workflows
This is needed to open a PR on GitHub. This commit is being made so that
the PR can start off in a broken Actions state.

This also initially disables most non-Bazel workflows to make workflow
iteration faster and less impacting on other team members.
This introduces a new mechanism for passing lists of tests to sharded
test targets in CI, and hooks it up. No actual sharding is occurring
yet. This led to some simplifications in the CI workflow since the
script can be more dynamic in computing the full list of targets (which
also works around a previous bug with instrumentation tests being run).
Java proto lite also needed to be upgraded for the scripts to be able to
use it.

More testing/documentation needed as this functionality continues to
expand.
This simply partitions bucketed groups of targets into chunks of 10 for
each run. Only 3 buckets are currently retained to test sharding in CI
before introducing full support.
Fixes some caching bucket and output bugs. Also, introduces while loop &
keep_going to introduce resilience against app test build failures (or
just test failures in general).
Also, enable other workflows.

Note that CI shouldn't fully pass yet since some documentation and
testing needs to be added yet, but this is meant to be a more realistic
test of the CI environment before the PR is finished.
Adds a human-readable prefix to make the shards look a bit nicer.

Also, adds more fine-tuned partitioning to bucket & reduce shard counts
to improve overall timing. Will need to be tested in CI.
A newly computed variable wasn't updated to be used in an earlier
change.
Add docstrings for proto.
…/oppia-android into add-bundles-proguard-build-flavors
Neither 'mv -t' nor piping to mv work on OSX so we needed to find an
alternative (in this case just trying to move everything). This actually
works a bit better since it's doing a per-file move rather than
accommodating for files that shouldn't be moved (which isn't an issue
since the destination directory is different than the one containing the
AAB file).
Documentation, thorough tests, and detailed description of these changes
are still needed.
Also includes fixing circular dependency issue by splitting out some of
the locale components to be part of utility rather than domain (so that
utiltiy and other packages can depend on MachineLocale).
@BenHenning
Copy link
Member Author

@vinitamurthi & @rt4914 PTAL for codeowners.

@seanlip PTAL for codeowners change.

@anandwana001 PTAL at the PR as a whole for correctness.

@BenHenning
Copy link
Member Author

BenHenning commented Sep 21, 2021

SIGSEGV on StateFragmentLocalTest. Happens to me locally, too. Retrying Bazel CI.

Copy link
Member

@seanlip seanlip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for codeowners.

@seanlip seanlip removed their assignment Sep 21, 2021
Copy link
Contributor

@rt4914 rt4914 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for code-owner files.

@rt4914 rt4914 removed their assignment Sep 21, 2021
Copy link
Contributor

@anandwana001 anandwana001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, All the test are passing on checks that LGTM too.

@anandwana001 anandwana001 removed their assignment Sep 24, 2021
1. Introduce proper API compatibility for LocaleController
2. Ensure TranslationController is scoped (breaks test in downstream PR)
@BenHenning
Copy link
Member Author

@anandwana001 can you PTAL at e879c04 and let me know if it looks good to you? It was a late change after your approval & I'd like to make sure it gets some reviewer visibility.

@BenHenning
Copy link
Member Author

SIGSEGV on StateFragmentTest caused a CI flake. :( This is #2844 showing up again even though we thought it went away. My guess is that this was reintroduced because of sharding (& might actually be a separate issue).

Restarting CI to unblock this PR, but we'll need to look into this.

@vinitamurthi
Copy link
Contributor

@BenHenning I wanted to check with you before merging this. are we ok to merge this now?

@oppiabot oppiabot bot added the PR: LGTM label Sep 27, 2021
@oppiabot
Copy link

oppiabot bot commented Sep 27, 2021

Hi @BenHenning, this PR is ready to be merged. Please address any remaining comments prior to merging, and feel free to merge this PR once the CI checks pass and you're happy with it. Thanks!

@BenHenning
Copy link
Member Author

Thanks for the review everyone!

@vinitamurthi I'll merge this momentarily. I want to make sure downstream PRs are in a good in-between state before doing that.

@anandwana001
Copy link
Contributor

@anandwana001 can you PTAL at e879c04 and let me know if it looks good to you? It was a late change after your approval & I'd like to make sure it gets some reviewer visibility.

LGTM

@anandwana001 anandwana001 removed their assignment Sep 27, 2021
@BenHenning
Copy link
Member Author

BenHenning commented Sep 27, 2021

Thanks @anandwana001! Will be merging this momentarily.

Edit: since develop changed & this is a very large PR, I'm pulling the latest changes into this branch & reverifying CI before merging.

@BenHenning
Copy link
Member Author

All CI checks are passing, and there are no open unresolved comment threads. Thanks again all for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants