Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple languages when processing text #11

Open
zaratsian opened this issue Sep 12, 2022 · 0 comments
Open

Add support for multiple languages when processing text #11

zaratsian opened this issue Sep 12, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@zaratsian
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
This is a new feature request and not related to a bug or issue within the current code. However, English is the only supported language and we would like to enable multilingual support.

Describe the solution you'd like
Add multiple language support for text via the Google Cloud Translation API. The proposed solution will process data in the following order:

  1. Capture raw text input from the user. [this step already exists as part of Clean Chat]
  2. Use the Google Cloud Translation API to auto-detect the language.
  3. If language is not equal to English, then convert the non-English language to English using the Cloud Translation API.
  4. Score the translated text for toxicity as part of the Clean Chat pipeline. [this step already exists as part of Clean Chat]

Describe alternatives you've considered
I've considered the tradeoffs between using the Translation API versus analyzing the native language(s) directly. While analyzing the native language directly may produce more accurate results, it does require additional model training and complexity in maintaining multiple language models. Native language analysis may be a feature that we add in the future, but our results with the Google Cloud Translation API look promising and the API is stable, reliable, and scalable.

Additional context
Need to add a flag for users to enable or disable multiple language support. This feature only applies to text input.

@zaratsian zaratsian added the enhancement New feature or request label Sep 12, 2022
@zaratsian zaratsian self-assigned this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant