-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP > 8.1 fails to detect Māori macrons #11908
Comments
U+0101 UTF-8 hex C481 valid $str = 'Total%20M%C4%81ori%2C31.5%2C33.3%2C31.8%2C33%2C36.4%2C33.2%2C33.2';
$rawstr = rawurldecode($str);
var_dump(
mb_check_encoding($rawstr, 'ISO-8859-1'),
mb_check_encoding($rawstr, 'UTF-8'),
mb_check_encoding($rawstr, 'WINDOWS-1251')
); Expected result true |
It behavior seems PHP >= 8.1 is correct. Because Please see below of
Therefore, PHP >= 8.1 of |
I mistake, However, originally I want you to use the character code assuming what you are using. |
I'm sorry, I mistake again. Anyway, this case is detect(guess) encoding is very difficult. My opinion is not change that I want you to use the character code assuming what you are using. |
@RoSk0 Is U+0101 the only accented character which is commonly used when writing about the Maori people? |
@alexdowad Thanks for asking. The following is as I understand it (of European descent, living in Aotearoa, some beginner study of Te Reo Māori and a colleague of @RoSk0). Those characters are used when writing in Te Reo Māori - not only when writing about tangata Māori (Māori people). There are ten vowels in Te Reo Māori - a,e,i,o,u and the long versions ā,ē,ī,ō,ū, sometimes also represented as double-vowels (aa, ee, ...). Incorrect vowels can have significant impact on meaning. There are also the capital forms A,E,I,O,U and Ā,Ē,Ī,Ō,Ū.
Have added some macron examples to #12025 (Corrections welcome please, doing my best to help here but not expert.) |
@alexdowad Is it intentional that this issue is still open, or did you forget to close this? 🙂 |
@nielsdos Uhhh... I forgot. |
Description
The following code:
https://3v4l.org/jYDqY
Resulted in this output:
But I expected this output instead:
Related issues
mb_detect_encoding()
detects UTF-8 emoji byte sequence as ISO-8859-1 since PHP 8.1 #7871PHP Version
PHP 8.1.22
Operating System
No response
The text was updated successfully, but these errors were encountered: