Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected line length check behaviour when line has tabs. #6425

Closed
Nuno-Mota opened this issue Aug 8, 2023 · 3 comments · Fixed by #6491
Closed

Unexpected line length check behaviour when line has tabs. #6425

Nuno-Mota opened this issue Aug 8, 2023 · 3 comments · Fixed by #6491
Assignees
Labels
bug Something isn't working

Comments

@Nuno-Mota
Copy link

Hi.

It seems like the counting of line length with tabs has an unexpected behaviour.

From what I can infer, the length of each line is first counted considering that tabs have a size of 1.
If the resulting length is equal to or greater than the line-length setting, then the size is recomputed taking into account the tab-size setting.
Furthermore, the line needs to have at least 1 space (as discussed in #6424).

Is this the intended behaviour? If so, is there some place that explains the reasoning behind it?

This was tested for ruff 0.0.282.

MWE:

pyproject.toml

[tool.ruff]
line-length = 6
tab-size = 4
show-source = true

test file (note that github uses a default tab size of 8)

# aaaa
# aaaaa
    # a
	# a
	# aa
	# aaa
	# aaaa
		# a
		# aa
		# aaa

if True: # noqa: E501
	[12]
	[12 ]
	[1,2]
	[1, 2]

ruff's output

ruff test3.py 
test3.py:2:7: E501 Line too long (7 > 6 characters)
  |
1 | # aaaa
2 | # aaaaa
  |       ^ E501
3 |     # a
4 |     # a
  |

test3.py:3:7: E501 Line too long (7 > 6 characters)
  |
1 | # aaaa
2 | # aaaaa
3 |     # a
  |       ^ E501
4 |     # a
5 |     # aa
  |

test3.py:6:4: E501 Line too long (9 > 6 characters)
  |
4 |     # a
5 |     # aa
6 |     # aaa
  |       ^^^ E501
7 |     # aaaa
8 |         # a
  |

test3.py:7:4: E501 Line too long (10 > 6 characters)
  |
5 |     # aa
6 |     # aaa
7 |     # aaaa
  |       ^^^^ E501
8 |         # a
9 |         # aa
  |

test3.py:9:3: E501 Line too long (12 > 6 characters)
   |
 7 |     # aaaa
 8 |         # a
 9 |         # aa
   |         ^^^^ E501
10 |         # aaa
   |

test3.py:10:3: E501 Line too long (13 > 6 characters)
   |
 8 |         # a
 9 |         # aa
10 |         # aaa
   |         ^^^^^ E501
11 | 
12 | if True: # noqa: E501
   |

test3.py:14:4: E501 Line too long (9 > 6 characters)
   |
12 | if True: # noqa: E501
13 |     [12]
14 |     [12 ]
   |       ^^^ E501
15 |     [1,2]
16 |     [1, 2]
   |

test3.py:16:4: E501 Line too long (10 > 6 characters)
   |
14 |     [12 ]
15 |     [1,2]
16 |     [1, 2]
   |       ^^^^ E501
   |

Found 8 errors.
✗ - status code 1
@charliermarsh
Copy link
Member

Do you mind clarifying what behavior here is unexpected? "...then the size is recomputed taking into account..." sounds like an implementation detail as-described, and not a behavior per se.

@Nuno-Mota
Copy link
Author

Do you mind clarifying what behavior here is unexpected?

Of course.

The issue is that tab-indented lines that are longer than the line-length setting (taking into consideration the tab-size setting), might not be flagged for E501.

"...then the size is recomputed taking into account..." sounds like an implementation detail as-described, and not a behavior per se.

That might be. This is simply the behaviour I was able to infer.
In any case it seems to lead to an erroneous line length count in certain cases.

@charliermarsh
Copy link
Member

Ah, I think that's a result of the performance optimization I proposed in https://github.com/astral-sh/ruff/pull/5811/files, which doesn't hold true for tabs (for which a single-byte character can be wider than one byte).

@charliermarsh charliermarsh added the bug Something isn't working label Aug 8, 2023
@charliermarsh charliermarsh self-assigned this Aug 11, 2023
charliermarsh added a commit that referenced this issue Aug 11, 2023
## Summary

In #5811, I suggested that we add
a heuristic to the overlong-lines check such that if the line had fewer
bytes than the character limit, we return early -- the idea being that a
single byte per character was the "worst case". I overlooked that this
isn't true for tabs -- with tabs, the "worst case" scenario is that
every byte is a tab, which can have a width greater than 1.

Closes #6425.

## Test Plan

`cargo test` with a new fixture borrowed from the issue, plus manual
testing.
durumu pushed a commit to durumu/ruff that referenced this issue Aug 12, 2023
## Summary

In astral-sh#5811, I suggested that we add
a heuristic to the overlong-lines check such that if the line had fewer
bytes than the character limit, we return early -- the idea being that a
single byte per character was the "worst case". I overlooked that this
isn't true for tabs -- with tabs, the "worst case" scenario is that
every byte is a tab, which can have a width greater than 1.

Closes astral-sh#6425.

## Test Plan

`cargo test` with a new fixture borrowed from the issue, plus manual
testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants