Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zh-tw translation of units #310

Open
NSoiffer opened this issue Nov 13, 2024 · 6 comments
Open

zh-tw translation of units #310

NSoiffer opened this issue Nov 13, 2024 · 6 comments

Comments

@NSoiffer
Copy link
Owner

@hjy1210

I've added the ability to speak units to MathCAT. This mainly works if the author adds 'intent=":unit"' to the to mark "m", "km", etc., as a unit. The translation work that needs to be done is in Rules/Languages/zh/tw/definitions.yaml.

The following lists were added and need translation:

  • SIPrefixes

  • SIUnits -- these are all the SI units that take prefixes

  • UnitsWithoutPrefixes -- these are other "accepted" units and some variants on them

  • EnglishUnits -- if English units are used in your language, then translate these otherwise delete the contents but leave the name. For example

  • EnglishUnits: {}

  • PluralForms -- MathCAT automatically adds "s" for English plurals. I've removed that for the Chinese version because a few test examples in Google translate seem to use the same character for singular and plural . Hopefully that is correct. In English, some words have irregular plural forms (foot -> feet). Change this list to include any irregular plural units. Make it empty if there are none.

In English, fractions with units don't use the word over, they use "per" as in "meters per second". Does Chinese have that special case?

Let me know if you have any questions. Please make a PR when you have done the translation.

@hjy1210
Copy link
Contributor

hjy1210 commented Dec 3, 2024 via email

@NSoiffer
Copy link
Owner Author

NSoiffer commented Dec 4, 2024

Try copying tests\Languages\en\units.rs to tests\Languages\zh\tw\units.rs. Change all of the "en" to "zh-tw". Also add a line mod units; to tests\Languages\zh\tw.rs.

After that, if you run the tests (cargo test tw::units), you'll get a lot of failures. What I've done for the other languages is to copy the results of running the tests (the Chinese results in this case), and replace the English results with the Chinese results. Then look at the Chinese results carefully and correct any mistakes in them. They will then fail when you rerun the test and so you can correct them in either definitions.yaml or possibly SharedRules/general.yaml.

There are a lot of tests. You might want to only do a few of them if it is too much work. Or you can do a PR for what you've done and I'll copy over the results. Having done it many times, I've gotten faster at it. You would then need to correct anything that is wrong in the test results and then correct definitions.yaml and/or SharedRules/general.yaml.

Note: I have modified the unit rule in SharedRules/general.yaml. If Chinese has a common plural ending, copy the rule from the "en" version. Otherwise, copy it from the "vi" version.

Let me know if you have any questions.

@hjy1210
Copy link
Contributor

hjy1210 commented Dec 5, 2024

I encounter some problem.

  • In Rules/Languages/zh/tw/definitions.yaml, modify line 20 as "A" : "安培",
  • In tests/Languages/zh/tw/units.rs, lines 69-95, modify as :
fn si_base() {
    let expr = r#"<math>
        <mn>1</mn><mi intent=":unit">A</mi><mo>,</mo><mn>2</mn><mi intent=":unit">A</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">cd</mi><mo>,</mo><mn>2</mn><mi intent=":unit">cd</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">K</mi><mo>,</mo><mn>2</mn><mi intent=":unit">K</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">K</mi><mo>,</mo><mn>2</mn><mi intent=":unit">K</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">g</mi><mo>,</mo><mn>2</mn><mi intent=":unit">g</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">m</mi><mo>,</mo><mn>2</mn><mi intent=":unit">m</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">mol</mi><mo>,</mo><mn>2</mn><mi intent=":unit">mol</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">s</mi><mo>,</mo><mn>2</mn><mi intent=":unit">s</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">″</mi><mo>,</mo><mn>2</mn><mi intent=":unit">″</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">&quot;</mi><mo>,</mo><mn>2</mn><mi intent=":unit">&quot;</mi><mo>,</mo>
        <mn>1</mn><mi intent=":unit">sec</mi><mo>,</mo><mn>2</mn><mi intent=":unit">sec</mi>
    </math>"#;
    test("zh-tw", "SimpleSpeak", expr, 
        "1 安培, 逗號, 2 amps, comma, \
                1 candela, comma; 2 candelas, comma, \
                1 kelvin, comma, 2 kelvins, comma, \
                1 kelvin, comma, 2 kelvins, comma, \
                1 gram, comma, 2 grams, comma, \
                1 metre, comma, 2 metres, comma, \
                1 mole, comma, 2 moles, comma, \
                1 second, comma, 2 seconds, comma, \
                1 second, comma, 2 seconds, comma, \
                1 second, comma, 2 seconds, comma, \
                1 second, comma, 2 seconds");
}

When I test si_base, I expect <mn>1</mn><mi intent=":unit">A</mi><mo>,</mo> part will match, but I got following messages:

Executing task: C:\Users\hjy\.cargo\bin\cargo.exe test --package mathcat --test languages -- Languages::zh::tw::units::si_base --exact --show-output 

warning: elided lifetime has a name
    --> src\canonicalize.rs:4056:43
     |
4056 | pub fn name<'a>(node: &'a Element<'a>) -> &str {
     |             --                            ^ this elided lifetime gets resolved as `'a`
     |             |
     |             lifetime `'a` declared here
     |
     = note: `#[warn(elided_named_lifetimes)]` on by default

warning: elided lifetime has a name
   --> src\pretty_print.rs:274:51
    |
273 | impl<'a> YamlEmitter<'a> {
    |      -- lifetime `'a` declared here
274 |     pub fn new(writer: &'a mut dyn fmt::Write) -> YamlEmitter {
    |                                                   ^^^^^^^^^^^ this elided lifetime gets resolved as `'a`

warning: `mathcat` (lib) generated 2 warnings
    Finished `test` profile [optimized + debuginfo] target(s) in 2.20s
     Running tests\languages.rs (target\debug\deps\languages-65f750cf80589ca8.exe)

running 1 test
test Languages::zh::tw::units::si_base ... FAILED

successes:

successes:

failures:

---- Languages::zh::tw::units::si_base stdout ----
thread 'Languages::zh::tw::units::si_base' panicked at tests\common\mod.rs:48:23:
assertion `left == right` failed:
test with zh-tw/SimpleSpeak failed
  left: "1 安培, 逗號, 2 amps, comma, 1 candela, comma; 2 candelas, comma, 1 kelvin, comma, 2 kelvins, comma, 1 kelvin, comma, 2 kelvins, comma, 1 gram, comma, 2 grams, comma, 1 metre, comma, 2 metres, comma, 1 mole, comma, 2 moles, comma, 1 second, comma, 2 seconds, comma, 1 second, comma, 2 seconds, comma, 1 second, comma, 2 seconds, comma, 1 second, comma, 2 seconds"
 right: "1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit"
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf\library/std\src\panicking.rs:665
   1: core::panicking::panic_fmt
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf\library/core\src\panicking.rs:74
   2: core::panicking::assert_failed_inner
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf\library/core\src\panicking.rs:107
   3: core::panicking::assert_failed<ref$<str$>,alloc::string::String>        
             at C:\Users\hjy\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\panicking.rs:367
   4: languages::common::check_answer
             at .\tests\common\mod.rs:48
   5: languages::common::test
             at .\tests\common\mod.rs:70
   6: languages::Languages::zh::tw::units::si_base
             at .\tests\Languages\zh\tw\units.rs:83
   7: languages::Languages::zh::tw::units::si_base::closure$0
             at .\tests\Languages\zh\tw\units.rs:69
   8: core::ops::function::FnOnce::call_once<languages::Languages::zh::tw::units::si_base::closure_env$0,tuple$<> >
             at C:\Users\hjy\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ops\function.rs:250
   9: core::ops::function::FnOnce::call_once
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf\library/core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    Languages::zh::tw::units::si_base

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1820 filtered out; finished in 0.68s

error: test failed, to rerun pass `-p mathcat --test languages`

 *  The terminal process "C:\Users\hjy\.cargo\bin\cargo.exe 'test', '--package', 'mathcat', '--test', 'languages', '--', 'Languages::zh::tw::units::si_base', '--exact', '--show-output'" terminated with exit code: 101. 

Look the above line right: "1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit 逗號 1 unit 逗號 2 unit", lots of unit, it seems all the translations have no effect at all.

Details in my branch

What is the problem? What I am missing? Looks like some link is missing.

PS, my environment:

  1. VSCode as editor
  2. rust version: rustup 1.27.1 (54dd3d00f 2024-04-24)

@NSoiffer
Copy link
Owner Author

NSoiffer commented Dec 7, 2024

The warnings you can ignore. I was at cargo version 1.81 and I suspect you are at 1.82. I just updated to the latest (1.83) and there are many more warnings. I've fixed them and committed some, but some are in files that I have some other work that I'm not finished with.

Your problem is because I updated the rule for "unit" in SharedRules/general.yaml. I can't push to your repo, so here's what you need to change it to:

# the order of matching is
# 1. does it match the base of an SI unit
# 2. does it match an English unit (if in an English language)
# 3. does it match an SI prefix followed by an SI that accepts SI prefixes
# Due to this order, some things like "ft" and "cd" mean "feet" vs "femto-tonnes" and "pints" vs "pico-tonnes"
- name: unit
  tag: unit
  match: "$Verbosity != 'Terse' and contains(@data-intent-property, ':unit')"
  variables:
    # we need to look at preceding-sibling::*[2] because invisible times should have been added
    # if in a fraction, only count if we are in the numerator
  - IsSingular: "(parent::m:mrow and preceding-sibling::*[2][self::m:mn and . = 1]) or  
                 (ancestor::*[2][self::m:mrow] and parent::m:fraction and
                  (preceding-sibling::* or parent::*[preceding-sibling::*[2][self::m:mn and . = 1]])
                 )"
  - Prefix: "''"
  - Word: "''"  
  replace:
  - bookmark: "@id"
  - test:
    # is the whole string match a SI Unit without a prefix?
    - if: "DefinitionValue(., 'Speech', 'SIUnits') != ''"
      then:
      - set_variables: [Word: "DefinitionValue(., 'Speech', 'SIUnits')"]
    - else_if: "DefinitionValue(., 'Speech', 'UnitsWithoutPrefixes') != ''"
      then:
      - set_variables: [Word: "DefinitionValue(., 'Speech', 'UnitsWithoutPrefixes')"]
    - else_if: "DefinitionValue(., 'Speech', 'EnglishUnits') != ''"
      then:
      - set_variables: [Word: "DefinitionValue(., 'Speech', 'EnglishUnits')"]

    # do the first two chars match "da" and the remainder match an SIUnit
    - else_if: "string-length(.) >= 3 and 
                substring(., 1, 2) = 'da' and
                DefinitionValue(substring(., 3), 'Speech', 'SIUnits') != ''"
      then:
      - set_variables:
        - Prefix: "DefinitionValue('da', 'Speech', 'SIPrefixes')"
        - Word: "DefinitionValue(substring(., 3), 'Speech', 'SIUnits')"

    # does the first char match a prefix and the remainder match an SIUnit
    - else_if: "string-length(.) >= 2 and 
                DefinitionValue(substring(., 1, 1), 'Speech', 'SIPrefixes') != ''  and
                DefinitionValue(substring(., 2), 'Speech', 'SIUnits') != ''"
      then:
      - set_variables:
        - Prefix: "DefinitionValue(substring(., 1, 1), 'Speech', 'SIPrefixes')"
        - Word: "DefinitionValue(substring(., 2), 'Speech', 'SIUnits')"

    # not a known unit -- just speak the text, possibly as a plural
    - else:
      - set_variables:
        - Word: "text()"

  # somewhat complicated logic to avoid spaces around "-" as in "centi-grams" vs "centi - grams" -- probably doesn't matter
  - test:
      if: "$Prefix = ''"
      then:
      - test:
        - if: "$IsSingular"
          # HACK: '\uF8FE' is used internally for the concatenation char by 'ct' -- this gets the prefix concatinated to the base
          then: [x: "$Word"]
        - else_if: "DefinitionValue($Word, 'Speech', 'PluralForms') != ''"
          then: [x: "DefinitionValue($Word, 'Speech', 'PluralForms')"]
          else: [x: "$Word"]
      else:
      - x: "$Prefix"
      - ct: "-"
      - test:
        - if: "$IsSingular"
          # HACK: '\uF8FE' is used internally for the concatenation char by 'ct' -- this gets the prefix concatinated to the base
          then: [x: "concat('\uF8FE', $Word)"]
        - else_if: "DefinitionValue($Word, 'Speech', 'PluralForms') != ''"
          then: [x: "concat('\uF8FE', DefinitionValue($Word, 'Speech', 'PluralForms'))"]
          else: [x: "concat('\uF8FE', $Word)"]

You are also missing some translations. You should take a look at Rules/Languages/en/definitions.yaml and move some of the definitions values there into your definitions.yaml file. You might need to move some definition values between different definition names, but you can probably see which ones need to move based on the errors you get (some shifted to/from needing prefixes as I found other specifications on units and what is used in practice).

Let me know if you find other problems.

@hjy1210
Copy link
Contributor

hjy1210 commented Dec 12, 2024

My steps:

  1. updated the rule for "unit" in SharedRules/general.yaml as you said.
  2. In SIUnits of definitions.yaml, copy following lines from english version and make translation
    # others that take a prefix
    "a": "annum",               # should only take positive powers
    "as": "弧秒",          # see en.wikipedia.org/wiki/Minute_and_second_of_arc

    # technically wrong, but used in practice with SI Units
    "b": "位元",               # should only take positive powers
    "B": "位元組",              # should only take positive powers
    "Bd": "鮑",             # should only take positive powers
  1. In UnitsWithoutPrefixes of definitions.yaml, copy following lines from english version and make translation
    # powers of 2 used with bits and bytes
    "Kib": "kibi-位元", "Mib": "mebi-位元", "Gib": "gibi-位元", "Tib": "tebi-位元", "Pib": "pebi-位元", "Eib": "exbi-位元", "Zib": "zebi-位元", "Yib": "yobi-位元",
    "KiB": "kibi-位元組", "MiB": "mebi-位元組", "GiB": "gibi-位元組", "TiB": "tebi-位元組", "PiB": "pebi-位元組", "EiB": "exbi-位元組", "ZiB": "zebi-位元組", "YiB": "yobi-位元組",
  1. make other translations
  2. perform tests, 238 tests pass
  3. will make a PR

@hjy1210
Copy link
Contributor

hjy1210 commented Dec 12, 2024

I make a PR #321, the system say All checks are failed.
I can not understand the details. Following are part of message.

---- navigate::tests::cases_speech stdout ----
thread 'navigate::tests::cases_speech' panicked at src/navigate.rs:1873:13:
assertion `left == right` failed
  left: "move right, 2 cases, case 1; negative x comma, if x is less than 0; case 2; positive x comma, if x, is greater than or equal to 0;"
 right: "move right, 2 cases, case 1; negative x comma if x is less than 0; case 2; positive x comma if x, is greater than or equal to 0;"

---- navigate::tests::determinant_speech stdout ----
thread 'navigate::tests::determinant_speech' panicked at src/navigate.rs:1843:13:
assertion `left == right` failed
  left: "zoom out all; vertical line; table with 2 rows and 2 columns; row 1; column 1; 9, column 2; minus 13; row 2; column 1; 5, column 2; minus 6; vertical line"
 right: "zoom out all; the 2 by 2 determinant; row 1; 9, negative 13; row 2; 5, negative 6;"

I never touch the file src/navigate.rs.

I wonder what is the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants