-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ruff
] Implement unnecessary-regular-expression
(RUF055
)
#14659
Conversation
@dosisod This seems like it would be a good 'refurb' rule for your linter |
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
RUF055 | 50 | 50 | 0 | 0 | 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks great. You made this look simple.
The only thing that I notice we miss is raw-string support (or, at least, tests for it). Raw strings are the recommended way to write regex patterns in python because it avoids the need for double escaping.
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
// For now, reject any regex metacharacters. Compare to the complete list | ||
// from https://docs.python.org/3/howto/regex.html#matching-characters | ||
let has_metacharacters = string_lit.value.chars().any(|c| { | ||
matches!( | ||
c, | ||
'.' | '^' | '$' | '*' | '+' | '?' | '{' | '}' | '[' | ']' | '\\' | '|' | '(' | ')' | ||
) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you intentionally excluded meta-characters. So consider this an extension, and I think it's totally fine to do this as a follow-up pr (or not at all).
It would be nice if the rule only skips replacement for characters that are different between regex expressions and regular strings. For example, \n
matches \n
in a regex and a string.
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent PR writeup -- it made reviewing this really easy! This looks great overall.
The limitations around Match seem necessary, but some of the other restrictions can probably be loosened. For example, the sub replacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plain str variables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.
We don't have an out-of-the-box way of doing this for strings right now, so I wouldn't try to tackle it in this PR. But if you're interested, a followup might be to add an is_str()
function to ruff_python_semantic::analyze::typing
that looks similar to this is_list
function:
ruff/crates/ruff_python_semantic/src/analyze/typing.rs
Lines 739 to 745 in d9cbf2f
/// Test whether the given binding can be considered a list. | |
/// | |
/// For this, we check what value might be associated with it through it's initialization and | |
/// what annotation it has (we consider `list` and `typing.List`) | |
pub fn is_list(binding: &Binding, semantic: &SemanticModel) -> bool { | |
check_type::<ListChecker>(binding, semantic) | |
} |
And then you could use that in this rule for stronger type inference
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Micha Reiser <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Thank you both for the great reviews! I think I've incorporated all of the suggestions, with the exception of handling simple escapes like Similarly, I'm quite interested in the |
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Simon Brugman <[email protected]>
Co-authored-by: Simon Brugman <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
32 32 | | ||
33 33 | # this should be replaced with "abc" == s | ||
34 |-if re.fullmatch("abc", s): | ||
34 |+if "abc" == s: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor note, I think this should be s == "abc"
. It's a minor stylistic difference (in which I think x == VALUE
is more idiomatic), but it can cause actual differences due to type checking implementations, e.g. microsoft/pyright#9093
Summary
This is a limited implementation of the rule idea from #12283 to replace some uses of the
re
module withstr
method calls. A few of the examples given there:For this initial implementation, I've restricted the rule to string literals in the
pattern
argument to each of there
functions and further restricted these string literals to exclude anyre
metacharacters. Each of there
functions takes additional kwargs that change their behavior, so the rule doesn't apply when these are present either. re.sub can also take a function as the replacement argument (unlikestr.replace
, which expects anotherstr
), so the rule is also restricted to cases where that argument is also a string literal. Finally,match
,search
, andfullmatch
returnMatch
objects unlike the proposed fixes, so the rule only applies when these are used in a boolean test for their truth values. For example,would trigger the rule, but the plain
re.match("abc", s)
call above would not because the returnedMatch
could be used. I think this is probably a fairly common use case, so the rule can still be useful even with these restrictions.The limitations around
Match
seem necessary, but some of the other restrictions can probably be loosened. For example, thesub
replacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plainstr
variables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.I think these checks can also be directly extended to the
regex
package. I sawunraw-re-pattern
(RUF039
), for example, handles bothre
andregex
, but I only handledre
for now.Test Plan
cargo test
with newRUF055.py
snapshot test.Possible related rule
Right before submitting this, I tried running
RUF055.py
with python, and it crashed with aValueError: cannot use LOCALE flag with a str pattern
. That would be an easy thing to check with very similar code to what I have here.