Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pycodestyle-whitespace-rules #1111

Closed
wants to merge 6 commits into from
Closed

Pycodestyle-whitespace-rules #1111

wants to merge 6 commits into from

Conversation

nanthony007
Copy link

Add support for E2 and maybe E3 pycodestyle rules.

Add CheckCode(E201) variant and CheckKind (WhiteSpaceAfter `char`).
@nanthony007
Copy link
Author

So currently I've added the two variants to src/checks.rs and set the CheckKind::WhiteSpaceAfter to take a char expecting it to somewhere take the '(', '[', '{' characters. I also added a template for the E201 fixture.

Now I just am unsure as to how to go about implementing logic for either AST for tokens as per CONTRIBUTING.md suggestions.

@charliermarsh
Copy link
Member

My approach here is to typically try and stay as closely as I can to the existing pycodestyle implementation -- you can see that check here. So, typically, I'd say take that logic, and add it to check_lines.rs (the line-based checker, rather than the token- or AST-based checker).

The hiccup is that pycodestyle is running that check over "logical" lines rather than "physical" lines. (I'm not sure whether you're familiar with the distinction (I wasn't until recently), but a logical line is Python statement, while a physical line is an actual line-of-code in the editor. We don't have support for generating logical lines in check_lines.rs right now.

I could see a few paths forward here:

  1. Add support for generating logical lines by porting that logic from pycodestyle. (Not totally trivial but maybe a good exercise if you're interested, and useful beyond this specific rule since it would make it much easier to port arbitrary pycodestyle rules.)
  2. In check_ast.rs#visit_stmt, you could do something like this to extract the logical line based on the statement locations (we already know where statements start and end since we created an AST):
let text = checker.locator.slice_source_code_range(&Range::from_located(stmt));

Then, use run regular expressions against the text as in pycodestyle.

@charliermarsh
Copy link
Member

Another option is that I may be able to add the logical line code, in which case, adding new pycodestyle checks would be much easier.

@nanthony007
Copy link
Author

nanthony007 commented Dec 7, 2022

Sorry for delayed response. I was working on a solution using method 2 you described. I believe I have a working solution but have three problems.

  1. How to "activate" the rule. As in in check_ast.rs we frequently use the pattern if self.settings.enabled.contains(&CheckCode::XXX) but how do I activate this code? If I place the logic outside that block it runs and is successful but not within.
  2. The logic I have for logical_line inside #visit_stmt function results in the message having the correct row for the error but defaults to the column 1 always and I am trying to find a way to return a new range while keeping the standard Range::from_located(stmt) pattern.
  3. Finally the regex pattern they use in pycodestyle that you shared (I verified it is the one they still currently use on main as well) uses a backward look-around ((?!=)) which Rust-Regex does not support. I removed this and the code seems to function properly but it may be for capturing an edge-case my fixture file does not include.

I can check this code in if you would like to look at it.

I am also not opposed to attempting to implement the logical_line functionality myself. I believe it is found here.

@charliermarsh
Copy link
Member

Ok cool!

I've made a little bit of progress on the logical line stuff, but I need to benchmark it to know if it's way too slow.

Happy to take a look at what you have thus far, it may end up being better. The main thing I was unsure about with Approach 2 (after writing that comment) is that statements are nested, so taking the source code for a single statement isn't actually equivalent to a logical line, unless I'm misunderstanding.

For example, if you have...

def f():
  x = 1

And you use the approach I described above, the first string would be the entire function, and the second would be x = 1. But maybe there's a way around this...

To answer your questions:

  1. If you've added a new check code, you should run cargo dev generate-check-code-prefix to update the autogenerated mapping. Then you can run cargo run foo.py --select E201 or whatever the code is. Does that work?
  2. We have some helpers that might be useful for this, helpers::to_absolute might do what you want!

@charliermarsh
Copy link
Member

#1130 implements the logical line building, and it seems to work in testing a variety of cases against pycodestyle. But it's way too slow, like more than doubles Ruff's runtime on the CPython benchmark. All the time is spent building the logical lines.

@nanthony007
Copy link
Author

nanthony007 commented Dec 8, 2022

Alright I figured I would just push my code and we can go from there. I focused on implementation. I basically pull out the logical line you suggested and then implement the regex search. You can see from returned data (first screenshot) that we are capturing the errors but always with column: 1. The print statements verify we can get the correct column using stmt.location.column() + found + 1 (second screenshot).

CleanShot 2022-12-08 at 09 37 12@2x

CleanShot 2022-12-08 at 09 36 50@2x

@charliermarsh
Copy link
Member

I think that to get the right range, we'd want to do something like this:

self.add_check(Check::new(
    CheckKind::WhiteSpaceAfter(character),
    Range {
        location: Location::new(stmt.location.row(), stmt.location.column() + found),
        end_location: Location::new(stmt.location.row(), stmt.location.column() + found + 1)
    }
));

(Not exactly right, doesn't handle newlines and multiple whitespaces, but the idea would be to use Range { ... } rather than Range::from_located(...).)

@charliermarsh
Copy link
Member

I think this approach is going to be somewhat limiting though, because it's not the case that each statement corresponds to one logical line.

For example, if we have a nested function:

def f():
  def g():
    pass

Then with the current approach, we'd first look at this string:

"""
def f():
  def g():
    pass
"""

And then, when we recursed, we'd look at this string:

"""
  def g():
    pass
"""

Pycodestyle also has some rules whereby it removes comments and removes string contents (which I assume get in the way of the regular expressions), so we'd need to tokenize and find a way to handle those too.

In short, I think we do need to create logical lines separately, or come up with a whole new strategy for this that's very different from Pycodestyle's strategy (e.g., iterate over the token stream, look for Lpar and Lbrace tokens, and see if they have whitespace after them).

@charliermarsh
Copy link
Member

Closing for now to keep the PR list up-to-date, can always reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants