-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect character counting for non-ASCII characters #11396
Comments
Hi! I believe this is a known deviation from Black.
Related #3714 and psf/black#3445 |
If all tokens are counted in unicode width, line splitting should not occur since |
These two lines have the same number of characters (88 + newline): u = "this is test" # comment 🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍
u = "this is test" + "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍" However, the formatting results are different: u = "this is test" # comment 🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍
u = (
"this is test"
+ "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
) |
$ ruff check test.py
test.py:1:60: E501 Line too long (146 > 88)
test.py:3:56: E501 Line too long (153 > 88)
Found 2 errors. It seems that the character count will not work correctly in some situations. |
I believe ruff does not format (wrap) long trailing comments if associated with code on the same line. Probably because in this case it does not know if the comment is about the variable name or the value. Using ASCII characters in your example also does not cause ruff formatting to rewrap:
|
Sorry, my example are misguided! I created a more appropriate reproduction code. "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍! + !🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
"🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍" + "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! + !aaaaaaaaaaa"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + "aaaaaaaaaaa" pyproject.toml: [tool.ruff]
[tool.ruff.lint]
select = [
"E",
"F",
"W",
] lint result: $ ruff check test.py
test.py:1:48: E501 Line too long (89 > 88)
test.py:2:48: E501 Line too long (89 > 88)
Found 2 errors. format result: "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍! + !🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
(
"🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
+ "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍"
)
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! + !aaaaaaaaaaa"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + "aaaaaaaaaaa" |
To be clear, we do not use character count. We use character width. |
Oh, I see! I misunderstood "character width" to mean the number of bytes. I understand this works as intended. |
I am a Japanese speaker and am working on a project where Japanese is used as part of the code.
I have gotten different output from black and ruff for the line length.
black:
ruff:
ruff version:
ruff 0.4.4
The text was updated successfully, but these errors were encountered: