-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only Python cells should be validated in Jupyter notebooks #12281
Comments
I thought we filtered based on this already but can't find it. \cc @dhruvmanila |
I'm not sure if we filter based on the file's metadata. I thought we filter based on the cell metdata. |
If I'm understanding right it looks like it is validated here:
However the metadata format I see in my polyglot or Python notebooks looks to be in a slightly different structure, so it's just returning the default |
@stewartadam Can you provide a full Notebook? I can look at where that metadata is coming from. Is it coming from the cell or from the Notebook itself? We verify that it's a Python notebook by looking at the notebook-level metadata which is what the function that you've linked does: ruff/crates/ruff_notebook/src/notebook.rs Lines 391 to 398 in 880c31d
|
Here are two notebooks: https://gist.github.com/stewartadam/528689d9bb917715a4c16a6ff9282de3 It looks like the extensions have a habit of making a mess of the metadata though. Creating a new notebook defaults to Python and after switching it to .NET Interactive, it gets the .NET Interactive metadata layered on top of the original Python language metadata. Similarly, saving a copy of the .NET notebook as python.ipynb and then changing the kernel+cells to Python left all the .NET metadata in place (look at the first revision in the Gist). I suspect we'll want to introspect a mix of the notebook language, cell type, and kernel. |
This is confusing. The "metadata": {
"vscode": {
"languageId": "polyglot-notebook"
}
}, While, the "language_info": {
"name": "python"
}, And, the "metadata": {
"vscode": {
"languageId": "javascript"
}
}, But, the Which extensions are all involved here? Is it just https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode or are there any other extension? I'll probably need to look at the metadata schema these extensions have and encode it accordingly. I'm not sure how much effort it is to support this, it's not a priority right now with other important things going on. |
Related, the open AI notebooks fail parsing because they contain "code" cells where only the |
Yeah, this seems reasonable. I'll want to check for any official documentation on this field and might require to look at the source code if it doesn't exists. |
## Summary This PR adds support for VS Code specific cell metadata to consider when collecting valid code cells. For context, Ruff only runs on valid code cells. These are the code cells that doesn't contain cell magics. Previously, Ruff only used the notebook's metadata to determine whether it's a Python notebook. But, in VS Code, a notebook's preferred language might be Python but it could still contain code cells for other languages. This can be determined with the `metadata.vscode.languageId` field. ### References: * https://code.visualstudio.com/docs/languages/identifiers * https://github.com/microsoft/vscode/blob/e6c009a3d4ee60f352212b978934f52c4689fbd9/extensions/ipynb/src/serializers.ts#L104-L107 * https://github.com/microsoft/vscode/blob/e6c009a3d4ee60f352212b978934f52c4689fbd9/extensions/ipynb/src/serializers.ts#L117-L122 This brings us one step closer to fixing #12281. ## Test Plan Add test cases for `is_valid_python_code_cell` and an integration test case which showcase running it end to end. The test notebook contains a JavaScript code cell and a Python code cell.
@stewartadam So, #12864 should actually also fix the Polyglot notebooks case as well, at least it does for the linked notebooks: ❯ ./target/debug/ruff check ~/Downloads/notebooks/dotnet-polyglot.ipynb --no-cache
All checks passed!
❯ ./target/debug/ruff check ~/Downloads/notebooks/python.ipynb --no-cache
All checks passed!
If this metadata is of the form Do you have an example Polyglot notebook which resembles the one in the PR description? I could test it out. |
Looking at the metadata you've provided, I think it might be useful to use the |
This great to hear! Agreed kernelspec.language as a fallback given the metadata captured from the notebooks generaetd with VSCode extension. |
I'll chime in and say that magic commands for Databricks environments should have those cells disabled: https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages eg, cells that start with the pattern |
This seems like a special case for Databricks environment specifically, I'm not sure how to detect that. My main hesitation is that a single percent sign magic command like ruff/crates/ruff_notebook/src/cell.rs Lines 243 to 254 in c6b311c
Ruff will only ignore the line where a line magic is defined but will ignore the entire cell for cell magics except for the above cell magic commands. |
That's fair enough. Databricks is an enormous platform used by a ton of big shops, so it seems like a decent carve-out, but it is nevertheless at least somewhat niche. The language support is limited, so potentially a match fail on |
Description
When using creating a multi-language notebook (e.g. using the Polyglot Notebooks extension), Ruff continues to validate all cells producing many linting errors on non-Python code.
Context
In our case we were creating a C# notebook with .NET Interactive kernel and Ruff produced many linting errors. As a workaround, we have a top-level code cell with the contents
# ruff: noqa
to disable this, but it would be great if Ruff examined the cell language and only validated Python code.#10504 mentions notebooks language is validated in the metadata, however Ruff is still linting the C# code even with this metadata present in the notebook:
One thing to note with polyglot notebooks is each cell does have a language associated to it - it would be great if Ruff could parse this per-cell metadata and only validate Python code cells.
Additional Details
ruff 0.4.10
configuration (pyproejct.toml):
Example output of linting errors produced on C# code:
The text was updated successfully, but these errors were encountered: