Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22922 BRS 77 Integrate CLDR 46.1 beta1 to ICU main #3292

Merged

Conversation

pedberg-icu
Copy link
Contributor

@pedberg-icu pedberg-icu commented Dec 9, 2024

Checklist

  • Required: Issue filed: ICU-22922
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

This integrates CLDR 46.1 beta1 to ICU main (note that in main ICU still has version 76, so this keeps the data file using 76). It is in 3 separate commits for ease of review:

  • Part 1 is the binary data generated from CDLR/ICU data.
  • Part 2 is the ICU source data/test files generated or copied from CLDR. (Note that as in the CLDR 46 integrations, the portion-per-1e9 entries are manually removed from unitsTest.txt test file in icu4c/source/test/testdata/cldr/units/ and `icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/cldr/units/ since the relevant code is not yet ready).
  • Part 3 is the changes to ICU code and tool sources (here just one test file that removes a logKnownIssue skip for something fixed in 46.1)

This should be merged as 3 separate commits, not squashed.
ALLOW_MANY_COMMITS=true

@pedberg-icu
Copy link
Contributor Author

/azp run CI-Exhaustive

Copy link

No pipelines are associated with this pull request.

@markusicu
Copy link
Member

/azp run CI-Exhaustive

We no longer use Azure. Please trigger the exhaustive tests like this:
https://unicode-org.github.io/icu/userguide/dev/ci.html#exhaustive-tests

I went to https://github.com/pedberg-icu/icu/actions/workflows/icu_exhaustive_tests.yml but I don't get the workflow trigger. I assume that only you can do so in your fork.

@markusicu
Copy link
Member

PS: I also removed the TODOs from the PR description.

@pedberg-icu
Copy link
Contributor Author

pedberg-icu commented Dec 9, 2024

We no longer use Azure. Please trigger the exhaustive tests like this: https://unicode-org.github.io/icu/userguide/dev/ci.html#exhaustive-tests

Thanks, I have done that, seems to be running: https://github.com/pedberg-icu/icu/actions/workflows/icu_exhaustive_tests.yml

But the integration instructions at https://unicode-org.github.io/icu/processes/cldr-icu.html (which I was following) still mention using /azp run CI-Exhaustive, so they need updating.

@@ -151,6 +151,7 @@ mass ; ton ; kilogram ; 907.18474 * x ; 907184.7
mass ; tonne ; kilogram ; 1,000 * x ; 1000000.0
mass ; earth-mass ; kilogram ; 5,972,200,000,000,000,000,000,000 * x ; 5.9722E27
mass ; solar-mass ; kilogram ; 1,988,470,000,000,000,000,000,000,000,000 * x ; 1.98847E33
night-duration ; night ; night ; 1 * x ; 1,000.00
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been manually removed in a previous integration and then restored for C but not J, so restoring it here.

@@ -187,6 +188,7 @@ speed ; kilometer-per-hour ; meter-per-second ; 2.5/9 * x ; 277.7778
speed ; mile-per-hour ; meter-per-second ; 0.44704 * x ; 447.04
speed ; knot ; meter-per-second ; 4.63/9 * x ; 514.4444
speed ; meter-per-second ; meter-per-second ; 1 * x ; 1,000.00
speed ; light-speed ; meter-per-second ; 299,792,458 * x ; 2.997925E11
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been manually removed in a previous integration and then restored for C but not J, so restoring it here.

@markusicu
Copy link
Member

part 2 & 3 changes lgtm

@markusicu
Copy link
Member

waiting for
Exhaustive Tests for ICU
Exhaustive Tests for ICU #‌12: Manually run by pedberg-icu

@pedberg-icu
Copy link
Contributor Author

There is an icu4c exhaustive test failure in RBBITest::TestMonkey which seems to have nothing to do with any CLDR 46.1 changes. I will file a ticket and try to create a logKnownIssue test skip for this:

2024-12-09T17:56:58.3843391Z       RBBITest {
2024-12-09T17:56:58.3843684Z          ... 
2024-12-09T17:56:59.4624551Z          TestMonkey {
2024-12-09T17:57:13.6558761Z          
2024-12-09T17:57:13.6561154Z          rbbitst.cpp:4044 Break found but not expected at index 600. Parameters to reproduce: @"type=line engineState=[14633387025324 32694747777995 98494207058454 69011839148816 97197595605111 87195949665851 272835630244490 147799212188463 207642725313245 224131158172479 108960611664529 17434876783706 1 6 5] loop=1"
2024-12-09T17:57:13.6563686Z               590 :  |  |      \u25cc  ALorig_DottedCircle             ALL ÷ / ÷ ALL                           DOTTED CIRCLE                           
2024-12-09T17:57:13.6570202Z               591 :  |  |      \ua977  JL                              ALL ÷ / ÷ ALL                           HANGUL CHOSEONG IEUNG-HIEUH             
2024-12-09T17:57:13.6571325Z               592 :  .  .      \u30fe  NSorig_EastAsian                × $NS                                    KATAKANA VOICED ITERATION MARK          
2024-12-09T17:57:13.6572305Z               593 :  .  .      \u2e56  CP                              × $CP                                    RIGHT SQUARE BRACKET WITH STROKE        
2024-12-09T17:57:13.6573236Z               594 :  .  .      \u05e6  HL                              $CPmEastAsian × ($AL | $HL | $NU)        HEBREW LETTER TSADI                     
2024-12-09T17:57:13.6574279Z               595 :  .  .      \u25cc  ALorig_DottedCircle             ($AL | $HL) × ($AL | $HL)                DOTTED CIRCLE                           
2024-12-09T17:57:13.6575250Z               596 :  |  |      \ufffc  CB                              ÷ $CB                                    OBJECT REPLACEMENT CHARACTER            
2024-12-09T17:57:13.6576517Z               597 :  .  .      \u035f  GLmEastAsian                    [^ $SP $BA $HY] × $GL                    COMBINING DOUBLE MACRON BELOW           
2024-12-09T17:57:13.6577666Z               598 :  .  .      \u302a  CMorig_EastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  IDEOGRAPHIC LEVEL TONE MARK             
2024-12-09T17:57:13.6578700Z               599 :  .  .      \u2010  BA_Hyphen                       $GL ×                                    HYPHEN                                  
2024-12-09T17:57:13.6579819Z           --> 600 :  .  |  \U0001f193  AI_EastAsian                    ( $sot | $BK | $CR | $LF | $NL | $SP | $ZW | $CB | $GL ) ( $HY | $Hyphen ) × $AL  SQUARED FREE                            
2024-12-09T17:57:13.6580868Z               602 :  .  .      \u0029  CP                              × $CP                                    RIGHT PARENTHESIS                       
2024-12-09T17:57:13.6581785Z               603 :  .  .      \u30e7  CJ                              ($CL | $CP) $SP* × $NS                   KATAKANA LETTER SMALL YO                
2024-12-09T17:57:13.6582672Z               604 :  .  .      \u2029  BK                              × ( $BK | $CR | $LF | $NL )              PARAGRAPH SEPARATOR                     
2024-12-09T17:57:13.6583673Z               605 :  |  |  \U0001fcda  ID_ExtPictUnassigned            $BK ÷                                    <unassigned-1FCDA>                      
2024-12-09T17:57:13.6584671Z               607 :  |  |      \u1121  JL                              ALL ÷ / ÷ ALL                           HANGUL CHOSEONG PIEUP-SIOS              
2024-12-09T17:57:13.6585551Z               608 :  |  |      \ufffc  CB                              ÷ $CB                                    OBJECT REPLACEMENT CHARACTER            
2024-12-09T17:57:13.6586747Z               609 :  .  .      \u2990  CLmEastAsian                    × $CL                                    RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
2024-12-09T17:57:13.9042919Z          
2024-12-09T17:57:13.9044818Z          rbbitst.cpp:4044 Break found but not expected at index 281. Parameters to reproduce: @"type=line engineState=[12931258389510 189063628112575 252816262792863 133076538952570 248037151651428 219321456041115 64307488750234 232318582801490 48114391686924 45130412958247 85592916245652 170465134600812 0 10 7] loop=1"
2024-12-09T17:57:13.9047223Z               268 :  |  |      \u2e3b  B2                              ALL ÷ / ÷ ALL                           THREE-EM DASH                           
2024-12-09T17:57:13.9048100Z               269 :  .  .      \u002d  HY                              × $HY                                    HYPHEN-MINUS                            
2024-12-09T17:57:13.9049010Z               270 :  .  .      \u3000  BA_EastAsian                    × $BA                                    IDEOGRAPHIC SPACE                       
2024-12-09T17:57:13.9049979Z               271 :  .  .      \u300d  CL_EastAsian                    × $CL                                    RIGHT CORNER BRACKET                    
2024-12-09T17:57:13.9051528Z               272 :  |  |  \U0001fdc5  ID_ExtPictUnassigned            ALL ÷ / ÷ ALL                           <unassigned-1FDC5>                      
2024-12-09T17:57:13.9052791Z               274 :  .  .      \u201d  QU_Pf                           × $QU_Pf ( $SP | $GL | $WJ | $CL | $QU | $CP | $EX | $IS | $SY | $BK | $CR | $LF | $NL | $ZW | $eot )  RIGHT DOUBLE QUOTATION MARK             
2024-12-09T17:57:13.9053766Z               275 :  .  .      \u002f  SY                              × $SY                                    SOLIDUS                                 
2024-12-09T17:57:13.9054674Z               276 :  .  .      \u298c  CLmEastAsian                    × $CL                                    RIGHT SQUARE BRACKET WITH UNDERBAR      
2024-12-09T17:57:13.9055875Z               277 :  .  .      \u000b  BK                              × ( $BK | $CR | $LF | $NL )              <control-000B>                          
2024-12-09T17:57:13.9056811Z               278 :  |  |      \ufe24  GLmEastAsian                    $BK ÷                                    COMBINING MACRON LEFT HALF              
2024-12-09T17:57:13.9057910Z               279 :  .  .      \u0bcd  CMorigmEastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  TAMIL SIGN VIRAMA                       
2024-12-09T17:57:13.9058939Z               280 :  .  .      \u2010  BA_Hyphen                       $GL ×                                    HYPHEN                                  
2024-12-09T17:57:13.9060081Z           --> 281 :  .  |  \U0001d305  ALorig_EastAsian                ( $sot | $BK | $CR | $LF | $NL | $SP | $ZW | $CB | $GL ) ( $HY | $Hyphen ) × $AL  DIGRAM FOR EARTH                        
2024-12-09T17:57:13.9061256Z               283 :  .  .      \u25cc  ALorig_DottedCircle             ($AL | $HL) × ($AL | $HL)                DOTTED CIRCLE                           
2024-12-09T17:57:13.9062399Z               284 :  |  |  \U0001f1db  ID_ExtPictUnassigned            ALL ÷ / ÷ ALL                           <unassigned-1F1DB>                      
2024-12-09T17:57:13.9063564Z               286 :  |  |  \U000abf88  XXmExtPictUnassigned            ALL ÷ / ÷ ALL                           <unassigned-ABF88>                      
2024-12-09T17:57:13.9064563Z               288 :  .  .      \u002d  HY                              × $HY                                    HYPHEN-MINUS                            
2024-12-09T17:57:13.9065414Z               289 :  |  |      \ucffc  H2                              ALL ÷ / ÷ ALL                           HANGUL SYLLABLE KWEO                    
2024-12-09T17:57:13.9066786Z               290 :  |  |  \U0001346a  ALorigmEastAsianmDottedCircle   ALL ÷ / ÷ ALL                           EGYPTIAN HIEROGLYPH-1346A               
2024-12-09T17:57:14.2790733Z          
2024-12-09T17:57:14.2792563Z          rbbitst.cpp:4044 Break found but not expected at index 285. Parameters to reproduce: @"type=line engineState=[250724539910984 52425412829191 90017532641859 119771380787919 65315481249317 123876172441301 65088897094322 50582213880376 8386633963035 275224778962997 146116788109872 237921335994760 1 2 9] loop=1"
2024-12-09T17:57:14.2794931Z               272 :  |  |      \ubc68  H2                              ALL ÷ / ÷ ALL                           HANGUL SYLLABLE BYAE                    
2024-12-09T17:57:14.2795907Z               273 :  |  |  \U0001fac3  EB_EastAsian                    ALL ÷ / ÷ ALL                           PREGNANT MAN                            
2024-12-09T17:57:14.2796884Z               275 :  .  .      \u0085  NL                              × ( $BK | $CR | $LF | $NL )              <control-0085>                          
2024-12-09T17:57:14.2797805Z               276 :  |  |      \u058f  PRmEastAsian                    $NL ÷                                    ARMENIAN DRAM SIGN                      
2024-12-09T17:57:14.2799401Z               277 :  .  .  \U000ea300  XXmExtPictUnassigned            ($PR | $PO) × ($AL | $HL)                <unassigned-EA300>                      
2024-12-09T17:57:14.2800574Z               279 :  .  .  \U00011c71  EXmEastAsian                    × $EX                                    MARCHEN MARK SHAD                       
2024-12-09T17:57:14.2801600Z               281 :  .  .  \U00016fe4  GL_EastAsian                    [^ $SP $BA $HY] × $GL                    KHITAN SMALL SCRIPT FILLER              
2024-12-09T17:57:14.2802736Z               283 :  .  .      \u0735  CMorigmEastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  SYRIAC ZQAPHA DOTTED                    
2024-12-09T17:57:14.2803767Z               284 :  .  .      \u2010  BA_Hyphen                       $GL ×                                    HYPHEN                                  
2024-12-09T17:57:14.2804950Z           --> 285 :  .  |  \U0001f8e4  XX_ExtPictUnassigned            ( $sot | $BK | $CR | $LF | $NL | $SP | $ZW | $CB | $GL ) ( $HY | $Hyphen ) × $AL  <unassigned-1F8E4>                      
2024-12-09T17:57:14.2806420Z               287 :  .  .  \U0001f195  AI_EastAsian                    ($AL | $HL) × ($AL | $HL)                SQUARED NEW                             
2024-12-09T17:57:14.2807557Z               289 :  .  .      \u200e  CMorigmEastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  LEFT-TO-RIGHT MARK                      
2024-12-09T17:57:14.2808625Z               290 :  .  .      \u2025  INmEastAsian                    × $IN                                    TWO DOT LEADER                          
2024-12-09T17:57:14.2809623Z               291 :  |  |      \uff04  PR_EastAsian                    ALL ÷ / ÷ ALL                           FULLWIDTH DOLLAR SIGN                   
2024-12-09T17:57:14.2810564Z               292 :  |  |      \u1bf3  VF                              ALL ÷ / ÷ ALL                           BATAK PANONGONAN                        
2024-12-09T17:57:14.2811511Z               293 :  |  |  \U0001f3fb  EM                              ALL ÷ / ÷ ALL                           EMOJI MODIFIER FITZPATRICK TYPE-1-2     
2024-12-09T17:57:14.2812440Z               295 :  |  |  \U00011f02  AP                              ALL ÷ / ÷ ALL                           KAWI SIGN REPHA                         
2024-12-09T17:57:14.3761600Z          
2024-12-09T17:57:14.3763364Z          rbbitst.cpp:4044 Break found but not expected at index 282. Parameters to reproduce: @"type=line engineState=[6746063612850 171180666492117 151036743466113 14741284809718 50900410964913 96146043270539 46671263904454 148473605227049 52734048270295 253523139883370 78425240552647 142146139483121 1 0 6] loop=1"
2024-12-09T17:57:14.3765699Z               274 :  |  |      \u1bf2  VF                              ALL ÷ / ÷ ALL                           BATAK PANGOLAT                          
2024-12-09T17:57:14.3766635Z               275 :  |  |      \ub728  H2                              ALL ÷ / ÷ ALL                           HANGUL SYLLABLE DDEU                    
2024-12-09T17:57:14.3767633Z               276 :  |  |  \U0001f932  EB_EastAsian                    ALL ÷ / ÷ ALL                           PALMS UP TOGETHER                       
2024-12-09T17:57:14.3768679Z               278 :  .  .  \U00016fe4  GL_EastAsian                    [^ $SP $BA $HY] × $GL                    KHITAN SMALL SCRIPT FILLER              
2024-12-09T17:57:14.3769847Z               280 :  .  .      \u302b  CMorig_EastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  IDEOGRAPHIC RISING TONE MARK            
2024-12-09T17:57:14.3770869Z               281 :  .  .      \u002d  HY                              $GL ×                                    HYPHEN-MINUS                            
2024-12-09T17:57:14.3772035Z           --> 282 :  .  |  \U00053b94  XXmExtPictUnassigned            ( $sot | $BK | $CR | $LF | $NL | $SP | $ZW | $CB | $GL ) ( $HY | $Hyphen ) × $AL  <unassigned-53B94>                      
2024-12-09T17:57:14.3773621Z               284 :  .  .      \u000d  CR                              × ( $BK | $CR | $LF | $NL )              <control-000D>                          
2024-12-09T17:57:14.3774605Z               285 :  |  |      \u25cc  ALorig_DottedCircle             $CR ÷                                    DOTTED CIRCLE                           
2024-12-09T17:57:14.3775726Z               286 :  .  .      \u302a  CMorig_EastAsian                (?<X>[^$BK $CR $LF $NL $SP $ZW]) ( $CM | $ZWJ )* → ${X}  IDEOGRAPHIC LEVEL TONE MARK             
2024-12-09T17:57:14.3776748Z               287 :  .  .      \u2010  BA_Hyphen                       × $BA                                    HYPHEN                                  
2024-12-09T17:57:14.3777599Z               288 :  .  .      \u002f  SY                              × $SY                                    SOLIDUS                                 
2024-12-09T17:57:14.3778766Z               289 :  |  |      \ud7bf  JV                              ALL ÷ / ÷ ALL                           HANGUL JUNGSEONG I-YEO                  
2024-12-09T17:57:14.3779664Z               290 :  .  .      \u11c2  JT                              $JV | $H2 × $JV | $JT                    HANGUL JONGSEONG HIEUH                  
2024-12-09T17:57:14.3780534Z               291 :  .  .      \u000a  LF                              × ( $BK | $CR | $LF | $NL )              <control-000A>                          
2024-12-09T17:57:14.3781462Z               292 :  |  |  \U00013287  CLmEastAsian                    $LF ÷                                    EGYPTIAN HIEROGLYPH O036B               
2024-12-09T17:57:14.3782498Z               294 :  |  |      \u301d  OP_EastAsian                    ALL ÷ / ÷ ALL                           REVERSED DOUBLE PRIME QUOTATION MARK    
2024-12-09T17:57:16.7121460Z terminate called without an active exception
2024-12-09T17:57:16.8046017Z make[2]: *** [Makefile:120: check-exhaustive-local] Aborted (core dumped)

@markusicu
Copy link
Member

There is an icu4c exhaustive test failure in RBBITest::TestMonkey which seems to have nothing to do with any CLDR 46.1 changes.

@eggrobin please take a look...
@aheninger FYI but probably related to Robin's recent test code changes

I will file a ticket and try to create a logKnownIssue test skip for this

ok

@pedberg-icu
Copy link
Contributor Author

pedberg-icu commented Dec 9, 2024

The ticket for the RBBITest::TestMonkey failure is https://unicode-org.atlassian.net/browse/ICU-22986

I will also run the exhaustive tests on the main branch to see if the error repros there

@markusicu @eggrobin Actually exhaustive tests on the main branch were run 3 days ago and show the same problem: https://github.com/unicode-org/icu/actions/runs/12209763924/job/34064998555

So not related to this PR, we can go ahead and approve & merge this

@pedberg-icu pedberg-icu merged commit 3b9c0fc into unicode-org:main Dec 9, 2024
101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants