Pycodestyle-whitespace-rules #1111

nanthony007 · 2022-12-06T20:34:55Z

Add support for E2 and maybe E3 pycodestyle rules.

Add CheckCode(E201) variant and CheckKind (WhiteSpaceAfter `char`).

nanthony007 · 2022-12-06T20:36:52Z

So currently I've added the two variants to src/checks.rs and set the CheckKind::WhiteSpaceAfter to take a char expecting it to somewhere take the '(', '[', '{' characters. I also added a template for the E201 fixture.

Now I just am unsure as to how to go about implementing logic for either AST for tokens as per CONTRIBUTING.md suggestions.

charliermarsh · 2022-12-06T21:05:12Z

My approach here is to typically try and stay as closely as I can to the existing pycodestyle implementation -- you can see that check here. So, typically, I'd say take that logic, and add it to check_lines.rs (the line-based checker, rather than the token- or AST-based checker).

The hiccup is that pycodestyle is running that check over "logical" lines rather than "physical" lines. (I'm not sure whether you're familiar with the distinction (I wasn't until recently), but a logical line is Python statement, while a physical line is an actual line-of-code in the editor. We don't have support for generating logical lines in check_lines.rs right now.

I could see a few paths forward here:

Add support for generating logical lines by porting that logic from pycodestyle. (Not totally trivial but maybe a good exercise if you're interested, and useful beyond this specific rule since it would make it much easier to port arbitrary pycodestyle rules.)
In check_ast.rs#visit_stmt, you could do something like this to extract the logical line based on the statement locations (we already know where statements start and end since we created an AST):

let text = checker.locator.slice_source_code_range(&Range::from_located(stmt));

Then, use run regular expressions against the text as in pycodestyle.

charliermarsh · 2022-12-07T19:27:29Z

Another option is that I may be able to add the logical line code, in which case, adding new pycodestyle checks would be much easier.

nanthony007 · 2022-12-07T21:40:55Z

Sorry for delayed response. I was working on a solution using method 2 you described. I believe I have a working solution but have three problems.

How to "activate" the rule. As in in check_ast.rs we frequently use the pattern if self.settings.enabled.contains(&CheckCode::XXX) but how do I activate this code? If I place the logic outside that block it runs and is successful but not within.
The logic I have for logical_line inside #visit_stmt function results in the message having the correct row for the error but defaults to the column 1 always and I am trying to find a way to return a new range while keeping the standard Range::from_located(stmt) pattern.
Finally the regex pattern they use in pycodestyle that you shared (I verified it is the one they still currently use on main as well) uses a backward look-around ((?!=)) which Rust-Regex does not support. I removed this and the code seems to function properly but it may be for capturing an edge-case my fixture file does not include.

I can check this code in if you would like to look at it.

I am also not opposed to attempting to implement the logical_line functionality myself. I believe it is found here.

charliermarsh · 2022-12-07T21:53:15Z

Ok cool!

I've made a little bit of progress on the logical line stuff, but I need to benchmark it to know if it's way too slow.

Happy to take a look at what you have thus far, it may end up being better. The main thing I was unsure about with Approach 2 (after writing that comment) is that statements are nested, so taking the source code for a single statement isn't actually equivalent to a logical line, unless I'm misunderstanding.

For example, if you have...

def f():
  x = 1

And you use the approach I described above, the first string would be the entire function, and the second would be x = 1. But maybe there's a way around this...

To answer your questions:

If you've added a new check code, you should run cargo dev generate-check-code-prefix to update the autogenerated mapping. Then you can run cargo run foo.py --select E201 or whatever the code is. Does that work?
We have some helpers that might be useful for this, helpers::to_absolute might do what you want!

charliermarsh · 2022-12-07T22:13:14Z

#1130 implements the logical line building, and it seems to work in testing a variety of cases against pycodestyle. But it's way too slow, like more than doubles Ruff's runtime on the CPython benchmark. All the time is spent building the logical lines.

nanthony007 · 2022-12-08T14:41:56Z

Alright I figured I would just push my code and we can go from there. I focused on implementation. I basically pull out the logical line you suggested and then implement the regex search. You can see from returned data (first screenshot) that we are capturing the errors but always with column: 1. The print statements verify we can get the correct column using stmt.location.column() + found + 1 (second screenshot).

charliermarsh · 2022-12-11T15:15:44Z

I think that to get the right range, we'd want to do something like this:

self.add_check(Check::new(
    CheckKind::WhiteSpaceAfter(character),
    Range {
        location: Location::new(stmt.location.row(), stmt.location.column() + found),
        end_location: Location::new(stmt.location.row(), stmt.location.column() + found + 1)
    }
));

(Not exactly right, doesn't handle newlines and multiple whitespaces, but the idea would be to use Range { ... } rather than Range::from_located(...).)

charliermarsh · 2022-12-11T15:18:37Z

I think this approach is going to be somewhat limiting though, because it's not the case that each statement corresponds to one logical line.

For example, if we have a nested function:

def f():
  def g():
    pass

Then with the current approach, we'd first look at this string:

"""
def f():
  def g():
    pass
"""

And then, when we recursed, we'd look at this string:

"""
  def g():
    pass
"""

Pycodestyle also has some rules whereby it removes comments and removes string contents (which I assume get in the way of the regular expressions), so we'd need to tokenize and find a way to handle those too.

In short, I think we do need to create logical lines separately, or come up with a whole new strategy for this that's very different from Pycodestyle's strategy (e.g., iterate over the token stream, look for Lpar and Lbrace tokens, and see if they have whitespace after them).

charliermarsh · 2023-01-06T15:50:22Z

Closing for now to keep the PR list up-to-date, can always reopen.

nanthony007 added 2 commits December 6, 2022 15:33

Add checks.

0fe1ad0

Add CheckCode(E201) variant and CheckKind (WhiteSpaceAfter `char`).

Add E201 fixture

0240bbb

nanthony007 added 4 commits December 8, 2022 09:40

Add new test case with trailing whitespace

722f6f5

Implement missing CheckKind message.

747638d

Add core logic using stmt location.

4f09f81

Update generated checks for E201.

d7b219a

lsorber mentioned this pull request Dec 21, 2022

feat: replace flake8 with ruff superlinear-ai/poetry-cookiecutter#144

Merged

4 tasks

charliermarsh closed this Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pycodestyle-whitespace-rules #1111

Pycodestyle-whitespace-rules #1111

nanthony007 commented Dec 6, 2022

nanthony007 commented Dec 6, 2022

charliermarsh commented Dec 6, 2022

charliermarsh commented Dec 7, 2022

nanthony007 commented Dec 7, 2022 •

edited

Loading

charliermarsh commented Dec 7, 2022

charliermarsh commented Dec 7, 2022

nanthony007 commented Dec 8, 2022 •

edited

Loading

charliermarsh commented Dec 11, 2022

charliermarsh commented Dec 11, 2022

charliermarsh commented Jan 6, 2023

Pycodestyle-whitespace-rules #1111

Pycodestyle-whitespace-rules #1111

Conversation

nanthony007 commented Dec 6, 2022

nanthony007 commented Dec 6, 2022

charliermarsh commented Dec 6, 2022

charliermarsh commented Dec 7, 2022

nanthony007 commented Dec 7, 2022 • edited Loading

charliermarsh commented Dec 7, 2022

charliermarsh commented Dec 7, 2022

nanthony007 commented Dec 8, 2022 • edited Loading

charliermarsh commented Dec 11, 2022

charliermarsh commented Dec 11, 2022

charliermarsh commented Jan 6, 2023

nanthony007 commented Dec 7, 2022 •

edited

Loading

nanthony007 commented Dec 8, 2022 •

edited

Loading