Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement many-to-one mapping between codes and rules #2517

Merged
merged 11 commits into from
Feb 14, 2023

Conversation

not-my-profile
Copy link
Contributor

@not-my-profile not-my-profile commented Feb 3, 2023

Implements #2186.

@charliermarsh
Copy link
Member

This is really exciting.

I've read through some of the code and commits but not the entirety of the change (yet). From my testing, it seems like every rule has one "primary" code. So useless-object-inheritance is mapped to PLR0205, PIE792, and UP004. And no matter how you --select it, it's always reported as PIE792. Similarly, you have to use PIE792 to # noqa it. I thinks this could be confusing, to --select UP and see PIE792 violations. How do you view it? Is that an accurate description of current behavior?

@charliermarsh
Copy link
Member

I'm not sure what I would expect as a user. I could imagine a few things:

  1. If I --select UP --select PIE, I see the error twice, once under both codes. (If I --select UP, I see the error once, under UP004; if I --select PIE, I see the error once, under PIE792.)
  2. If I --select UP --select PIE, I see the error once, under the "first" (?) code, or under the preferred code as is implicitly implemented here. (If I --select UP, I see the error once, under UP004; if I --select PIE, I see the error once, under PIE792.)

I don't know how I'd expect noqa violations to work: should any matching code ignore the violation despite the selector that was used to enable it? Or should I be required to add a noqa for every matching code individually?

@not-my-profile
Copy link
Contributor Author

From my testing, it seems like every rule has one "primary" code. I thinks this could be confusing, to --select UP and see PIE792 violations. How do you view it? Is that an accurate description of current behavior?

Yes that's how it currently works. I'd expect us to switch to human-friendly rule names #1773, so no matter which code you used to select in the future ruff will always only report the human-friendly rule name. Until we make that switch the behavior will be a bit confusing, but I don't see a good way around that.

see the error twice

I am positive that we absolutely do not want to report any error multiple times under any circumstance.

Note that it's totally possible that a rule is enabled via multiple codes, I don't think we want to report multiple codes for a single violation, so we have to pick one ... until we have come up with rule naming guidelines and renamed our rules accordingly.

should any matching code ignore the violation despite the selector that was used to enable it?

Yes. A rule can be identified via multiple codes. Which code you used to enable a rule doesn't matter.

@charliermarsh
Copy link
Member

I worry that running --select UP and seeing a PIE792 violation is a confusing enough experience that I'd want to really consider whether we enable these at all prior to migrating to human-friendly rule names.

@charliermarsh
Copy link
Member

I'm hesitant to merge this as-is due to some of the confusing behaviors that users will experience around aliasing (e.g., the --select UP-to-PIE scenario described above).

I don't have great answers for them yet, but I know that if we ship this as-is, it will be confusing for users, and we'll get a lot of feedback and questions stemming from that confusion.

If we want to merge this while giving ourselves time to figure out the best solution for aliasing, what we could do is merge this change but not yet implement any of the actual aliases (apart from, perhaps, deprecating rules, like BLE001).

@not-my-profile
Copy link
Contributor Author

Right that makes sense. I think the course of action is:

  1. Rename our rules to match the naming convention ... this does entail the merging of several rules (see #2714).
    (We do this first since after the next step renaming rules will be a breaking change.)
  2. Allow rules to be selected by their name and report violations by their name.
  3. Enable the many-to-one mapping.

I already rebased this PR yesterday, which was quite work-intensive since #2583 had landed in between. Further rebasing shouldn't take that much effort, so I'd be alright with leaving this PR open ... but yeah it would be nice to merge this just so that I don't have to deal with merging other changes that may crop up.

I could add an assert statement to the map_codes proc macro to assert that at most one code is mapped to one rule ... then we could merge this without any UX changes1 and simply remove the assert statements once we get to the above mentioned 3rd step.

Footnotes

  1. Aside from the C, C9, T, T1, T2 prefix deprecations in the 7th commit.

@sbrugman
Copy link
Contributor

(Heads-up: I will rename the pathlib rules in #2348 to os-path - the thing we detect)

@not-my-profile
Copy link
Contributor Author

not-my-profile commented Feb 14, 2023

I think that landing this PR, will take some coordination that this will be merged before any other registry.rs changes, since this PR changes the format of the frequently edited registry.rs file.

We want to remove the variants denoting whole Linters
from the RuleCodePrefix enum, so we have to introduce
a new RuleSelector::Linter variant.
Post this commit series several codes can be mapped to a single rule,
this commit therefore renames Rule::code to Rule::noqa_code,
which is the code that --add-noqa will add to ignore a rule.
@charliermarsh
Copy link
Member

👍 I’m happy to merge this next assuming it doesn’t change the UX right now — is that the case? I have to re-read the code but I’ll review tomorrow after the JetBrains webinar, and hopefully can merge tomorrow afternoon or evening.

@not-my-profile
Copy link
Contributor Author

Ok great. Yes after I have dropped the last commit demonstrating the mapping and added the aforementioned assert statement there shouldn't be any UX changes.

Rule::noqa_code previously return a single &'static str,
which was possible because we had one enum listing all
rule code prefixes. This commit series will however split up
the RuleCodePrefix enum into several enums ... so we'll end up
with two &'static str ... this commit wraps the return type
of Rule::noqa_code into a newtype so that we can easily change
it to return two &'static str in the 6th commit of this series.
Same reasoning as for the previous commit ... one &'static str
becomes two &'static str because we split the RuleCodePrefix enum.
Note that the .unwrap() we have to add now, will actually
be removed in the 6th commit.
Currently the define_rule_mapping! macro generates both the Rule enum as
well as the RuleCodePrefix enum and the mapping between the two.  After
this commit series the macro will only generate the Rule enum and the
RuleCodePrefix enum and the mapping will be generated by a new map_codes
proc macro, so we rename the macro now to fit its new purpose.
# This commit was generated by running the following Python code:
# (followed by `sed -Ei 's/(mod registry;)/\1mod codes;/' crates/ruff/src/lib.rs`
# and `cargo fmt`).

import json
import re
import subprocess

def parse_registry():
    file = open('crates/ruff/src/registry.rs')

    rules = []

    while next(file) != 'ruff_macros::register_rules!(\n':
        continue

    while (line := next(file)) != ');\n':
        line = line.strip().rstrip(',')
        if line.startswith('//') or line.startswith('#['):
            rules.append(line)
            continue
        code, path = line.split(' => ')
        name = path.rsplit('::')[-1]
        rules.append((code, name))

    while (line := next(file)) != 'pub enum Linter {\n':
        continue

    prefixes = []
    prefix2linter = []

    while (line := next(file).strip()) != '}':
        if line.startswith('//'):
            continue
        if line.startswith('#[prefix = '):
            prefixes.append(line.split()[-1].strip('"]'))
        else:
            for prefix in prefixes:
                prefix2linter.append((prefix, line.rstrip(',')))
            prefixes.clear()

    prefix2linter.sort(key = lambda t: len(t[0]), reverse=True)

    return rules, prefix2linter

rules, prefix2linter = parse_registry()

def parse_code(code):
    prefix = re.match('[A-Z]+', code).group()
    if prefix in ('E', 'W'):
        return 'Pycodestyle', code

    for prefix, linter in prefix2linter:
        if code.startswith(prefix):
            return linter, code[len(prefix) :]

    assert False

text = '''
use crate::registry::{Linter, Rule};

pub fn code_to_rule(linter: Linter, code: &str) -> Option<Rule> {
    #[allow(clippy::enum_glob_use)]
    use Linter::*;

    Some(match (linter, code) {
'''

for entry in rules:
    if isinstance(entry, str):
        if entry.startswith('//'):
            text += '\n' + entry
        else:
            text += entry
    else:
        namespace, code = parse_code(entry[0])
        text += f'({namespace}, "{code}") => Rule::{entry[1]},'
    text += '\n'

text += '''
       _ => return  None,
    })
}
'''

with open('crates/ruff/src/codes.rs', 'w') as f:
    f.write(text)
@charliermarsh
Copy link
Member

(Returning to this now.)

@not-my-profile
Copy link
Contributor Author

Let me know if you have questions.

@charliermarsh
Copy link
Member

Cool this looks good to me -- merging...

@charliermarsh charliermarsh merged commit 3179fc1 into astral-sh:main Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants