[red-knot] Typed diagnostic id #14869

MichaReiser · 2024-12-09T13:13:20Z

Summary

This PR introduces a structured DiagnosticId instead of using a plain &'static str. It is the first of three in a stack that implements a basic rules infrastructure for Red Knot.

DiagnosticId is an enum over all known diagnostic codes. A closed enum reduces the risk of accidentally introducing two identical diagnostic codes. It also opens the possibility of generating reference documentation from the enum in the future (not part of this PR).

The enum isn't fully closed because it uses a &'static str for lint names. This is because we want the flexibility to define lints in different crates, and all names are only known in red_knot_linter or above. Still, lower-level crates must already reference the lint names to emit diagnostics. We could define all lint-names in DiagnosticId but I decided against it because:

We probably want to share the DiagnosticId type between Ruff and Red Knot to avoid extra complexity in the diagnostic crate, and both tools use different lint names.
Lints require a lot of extra metadata beyond just the name. That's why I think defining them close to their implementation is important.

In the long term, we may also want to support plugins, which would make it impossible to know all lint names at compile time. The next PR in the stack introduces extra syntax for defining lints.

A closed enum does have a few disadvantages:

rustc can't help us detect unused diagnostic codes because the enum is public
Adding a new diagnostic in the workspace crate now requires changes to at least two crates: It requires changing the workspace crate to add the diagnostic and the ruff_db crate to define the diagnostic ID. I consider this an acceptable trade. We may want to move DiagnosticId to its own crate or into a shared red_knot_diagnostic crate.

Preventing duplicate diagnostic identifiers

One goal of this PR is to make it harder to introduce ambiguous diagnostic IDs, which is achieved by defining a closed enum. However, the enum isn't fully "closed" because it doesn't explicitly list the IDs for all lint rules. That leaves the possibility that a lint rule and a diagnostic ID share the same name.

I made the names unambiguous in this PR by separating them into different namespaces by using lint/<rule> for lint rule codes. I don't mind the lint prefix in a Ruff next context, but it is a bit weird for a standalone type checker. I'd like to not overfocus on this for now because I see a few different options:

We remove the lint prefix and add a unit test in a top-level crate that iterates over all known lint rules and diagnostic IDs to ensure the names are non-overlapping.
We only render [lint] as the error code and add a note to the diagnostic mentioning the lint rule. This is similar to clippy and has the advantage that the header line remains short (lint/some-long-rule-name is very long ;))
Any other form of adjusting the diagnostic rendering to make the distinction clear

I think we can defer this decision for now because the DiagnosticId contains all the relevant information to change the rendering accordingly.

Why `Lint` and not `LintRule`

I see three kinds of diagnostics in Red Knot:

Non-suppressable: Reveal type, IO errors, configuration errors, etc. (any DiagnosticId)
Lints: code-related diagnostics that are suppressable.
Lint rules: The same as lints, but they can be enabled or disabled in the configuration. The majority of lints in Red Knot and the Ruff linter.

Our current implementation doesn't distinguish between lints and Lint rules because we aren't aware of a suppressible code-related lint that can't be configured in the configuration. The only lint that comes to my mind is maybe division-by-zero if we're 99.99% sure that it is always right. However, I want to keep the door open to making this distinction in the future if it proves useful.

Another reason why I chose lint over lint rule (or just rule) is that I want to leave room for a future lint rule and lint phase concept:

lint is the what: a specific code smell, pattern, or violation
the lint rule is the how: I could see a future LintRule trait in red_knot_python_linter that provides the necessary hooks to run as part of the linter. A lint rule produces diagnostics for exactly one lint. A lint rule differs from all lints in red_knot_python_semantic because they don't run as "rules" in the Ruff sense. Instead, they're a side-product of type inference.
the lint phase is a different form of how: A lint phase can produce many different lints in a single pass. This is a somewhat common pattern in Ruff where running one analysis collects the necessary information for finding many different lints
diagnostic is the presentation: Unlike a lint, the diagnostic isn't the what, but how a specific lint gets presented. I expect that many lints can use one generic LintDiagnostic, but a few lints might need more flexibility and implement their custom diagnostic rendering (at least custom Diagnostic implementation).

Test Plan

cargo test

github-actions · 2024-12-09T13:21:12Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

carljm

Code looks good!

My main question about this PR is the use of the term "lint" to encompass core type checker diagnostics. This was discussed extensively in the CLI proposal, without a clear resolution. I don't think this usage is wrong (Python type checkers really are linters, since they don't integrate with runtime), but I suspect that even if we use this terminology internally, we will need to be careful to avoid it in the UI, because I think existing mypy/pyright users will likely find it confusing to have type errors referred to as "lints".

But you said you don't want to over-focus on this now, and we can change it, so not a blocking issue.

crates/ruff_db/src/diagnostic.rs

MichaReiser · 2024-12-10T07:32:16Z

My main question about this PR is the use of the term "lint" to encompass core type checker diagnostics. This was discussed extensively in the CLI proposal, without a clear resolution. I don't think this usage is wrong (Python type checkers really are linters, since they don't integrate with runtime), but I suspect that even if we use this terminology internally, we will need to be careful to avoid it in the UI, because I think existing mypy/pyright users will likely find it confusing to have type errors referred to as "lints".

I agree. We should use the term rule as the user-facing term and I also intend to use it in code when the context only allows rules but not other lints. For example, the configuration only allows rule names, not arbitrary lints. One possible design is to define a RuleName struct that rejects non-rule lints during deserialization. I only want to avoid using the term Rule in contexts where something isn't guaranteed to be backed by a rule or isn't configurable.

crates/ruff_db/src/diagnostic.rs

sharkdp

Thank you!

## Summary This is the second PR out of three that adds support for enabling/disabling lint rules in Red Knot. You may want to take a look at the [first PR](#14869) in this stack to familiarize yourself with the used terminology. This PR adds a new syntax to define a lint: ```rust declare_lint! { /// ## What it does /// Checks for references to names that are not defined. /// /// ## Why is this bad? /// Using an undefined variable will raise a `NameError` at runtime. /// /// ## Example /// /// ```python /// print(x) # NameError: name 'x' is not defined /// ``` pub(crate) static UNRESOLVED_REFERENCE = { summary: "detects references to names that are not defined", status: LintStatus::preview("1.0.0"), default_level: Level::Warn, } } ``` A lint has a name and metadata about its status (preview, stable, removed, deprecated), the default diagnostic level (unless the configuration changes), and documentation. I use a macro here to derive the kebab-case name and extract the documentation automatically. This PR doesn't yet add any mechanism to discover all known lints. This will be added in the next and last PR in this stack. ## Documentation I documented some rules but then decided that it's probably not my best use of time if I document all of them now (it also means that I play catch-up with all of you forever). That's why I left some rules undocumented (marked with TODO) ## Where is the best place to define all lints? I'm not sure. I think what I have in this PR is fine but I also don't love it because most lints are in a single place but not all of them. If you have ideas, let me know. ## Why is the message not part of the lint, unlike Ruff's `Violation` I understand that the main motivation for defining `message` on `Violation` in Ruff is to remove the need to repeat the same message over and over again. I'm not sure if this is an actual problem. Most rules only emit a diagnostic in a single place and they commonly use different messages if they emit diagnostics in different code paths, requiring extra fields on the `Violation` struct. That's why I'm not convinced that there's an actual need for it and there are alternatives that can reduce the repetition when creating a diagnostic: * Create a helper function. We already do this in red knot with the `add_xy` methods * Create a custom `Diagnostic` implementation that tailors the entire diagnostic and pre-codes e.g. the message Avoiding an extra field on the `Violation` also removes the need to allocate intermediate strings as it is commonly the place in Ruff. Instead, Red Knot can use a borrowed string with `format_args` ## Test Plan `cargo test`

* main: [red-knot] Add infrastructure to declare lints (#14873) [red-knot] Typed diagnostic id (#14869)

BurntSushi

Makes sense to me at this point in my journey. :-) Thank you for tagging me!

BurntSushi · 2024-12-10T18:34:05Z

crates/ruff_db/src/diagnostic.rs

+        }
+    }
+
+    pub fn as_str(&self) -> Result<&str, DiagnosticAsStrError> {


Would it make more sense to just return a DiagnosticAsStr type that implements Display? So, basically, a DiagnosticAsStrError like you have, but for all the variants. And in this case, you'd probably rename it to display() or some such.

Another pattern I like for things like this is to bite the bullet on the alloc, but do it at construction time. Then you can have an as_str() method that unconditionally returns &str.

I introduced it to avoid repeating the variant to name matching code for the matches method. I think the easiest would be to just call to_string() in matches which I'm open to do when we have other use cases for it.

Another solution is to have a macro that generates a name method for it.

Overall, I don't have a very good understanding yet of how this type will be used, and I suggest we change it as we see fit.

MichaReiser added red-knot Multi-file analysis & type inference diagnostics Related to reporting of diagnostics. labels Dec 9, 2024

MichaReiser requested a review from BurntSushi December 9, 2024 13:19

MichaReiser force-pushed the micha/diagnostic-id branch 2 times, most recently from 9d47d4c to 3792746 Compare December 9, 2024 14:06

MichaReiser marked this pull request as ready for review December 9, 2024 14:14

MichaReiser requested review from carljm, AlexWaygood and sharkdp as code owners December 9, 2024 14:14

MichaReiser mentioned this pull request Dec 9, 2024

[red-knot] Add infrastructure to declare lints #14873

Merged

carljm approved these changes Dec 10, 2024

View reviewed changes

crates/ruff_db/src/diagnostic.rs Outdated Show resolved Hide resolved

sharkdp reviewed Dec 10, 2024

View reviewed changes

crates/ruff_db/src/diagnostic.rs Outdated Show resolved Hide resolved

sharkdp reviewed Dec 10, 2024

View reviewed changes

crates/ruff_db/src/diagnostic.rs Outdated Show resolved Hide resolved

sharkdp approved these changes Dec 10, 2024

View reviewed changes

MichaReiser added 2 commits December 10, 2024 11:34

Introduce DiagnosticId

49c8152

Use lint:<code> over lint/<code>, avoid repetition of lint-ids

b012b24

MichaReiser force-pushed the micha/diagnostic-id branch from 5150616 to b012b24 Compare December 10, 2024 10:34

MichaReiser merged commit 5f54807 into main Dec 10, 2024
21 checks passed

MichaReiser deleted the micha/diagnostic-id branch December 10, 2024 15:58

dcreager added a commit that referenced this pull request Dec 10, 2024

Merge branch 'main' into dcreager/type-of-any

113af14

* main: [red-knot] Add infrastructure to declare lints (#14873) [red-knot] Typed diagnostic id (#14869)

BurntSushi reviewed Dec 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Typed diagnostic id #14869

[red-knot] Typed diagnostic id #14869

MichaReiser commented Dec 9, 2024 •

edited

Loading

github-actions bot commented Dec 9, 2024 •

edited

Loading

carljm left a comment

MichaReiser commented Dec 10, 2024

sharkdp left a comment

BurntSushi left a comment

BurntSushi Dec 10, 2024

BurntSushi Dec 10, 2024

MichaReiser Dec 10, 2024

[red-knot] Typed diagnostic id #14869

[red-knot] Typed diagnostic id #14869

Conversation

MichaReiser commented Dec 9, 2024 • edited Loading

Summary

Preventing duplicate diagnostic identifiers

Why Lint and not LintRule

Test Plan

github-actions bot commented Dec 9, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

carljm left a comment

Choose a reason for hiding this comment

MichaReiser commented Dec 10, 2024

sharkdp left a comment

Choose a reason for hiding this comment

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi Dec 10, 2024

Choose a reason for hiding this comment

BurntSushi Dec 10, 2024

Choose a reason for hiding this comment

MichaReiser Dec 10, 2024

Choose a reason for hiding this comment

MichaReiser commented Dec 9, 2024 •

edited

Loading

Why `Lint` and not `LintRule`

github-actions bot commented Dec 9, 2024 •

edited

Loading

`ruff-ecosystem` results