Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruff inserts "id": null to notebook cells and breaks GitHub notebook viewer #6834

Closed
harupy opened this issue Aug 24, 2023 · 14 comments · Fixed by #6851
Closed

Ruff inserts "id": null to notebook cells and breaks GitHub notebook viewer #6834

harupy opened this issue Aug 24, 2023 · 14 comments · Fixed by #6851
Labels
bug Something isn't working

Comments

@harupy
Copy link
Contributor

harupy commented Aug 24, 2023

Summary

https://github.com/harupy/mlflow/blob/3f1650db853d2c61ac0cdb0035be91b10f131859/examples/h2o/random_forest.ipynb

image

harupy/mlflow@3f1650d: a commit that replicates what ruff does:

Without "id": null, the notebook renders fine:
https://github.com/harupy/mlflow/blob/dab36dc4fd6ac6e442bb78c016e7813ced8c0a21/examples/h2o/random_forest.ipynb

image

How to reproduce

# Notebook without id
> cat a.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import os\n",
    "\n",
    "math.pi"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (ruff)",
   "language": "python",
   "name": "ruff"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

# Run ruff
> cargo run -p ruff_cli -- check --fix --no-cache a.ipynb
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/ruff check --fix --no-cache a.ipynb`
Found 1 error (1 fixed, 0 remaining).

# Check
> cat a.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "math.pi"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (ruff)",
   "language": "python",
   "name": "ruff"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
@harupy harupy changed the title Ruff inserts id: null to notebook cells which breaks the GitHub notebook viewer Ruff inserts "id": null to notebook cells which breaks the GitHub notebook viewer Aug 24, 2023
@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

Can we skip serializing id if it's None?

diff --git a/crates/ruff/src/jupyter/schema.rs b/crates/ruff/src/jupyter/schema.rs
index b6f9ed3c4..17b03e474 100644
--- a/crates/ruff/src/jupyter/schema.rs
+++ b/crates/ruff/src/jupyter/schema.rs
@@ -120,6 +120,7 @@ pub struct RawCell {
     /// Technically, id isn't required (it's not even present) in schema v4.0 through v4.4, but
     /// it's required in v4.5. Main issue is that pycharm creates notebooks without an id
     /// <https://youtrack.jetbrains.com/issue/PY-59438/Jupyter-notebooks-created-with-PyCharm-are-missing-the-id-field-in-cells-in-the-.ipynb-json>
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub id: Option<String>,
     /// Cell-level metadata.
     pub metadata: Value,
@@ -135,6 +136,7 @@ pub struct MarkdownCell {
     /// Technically, id isn't required (it's not even present) in schema v4.0 through v4.4, but
     /// it's required in v4.5. Main issue is that pycharm creates notebooks without an id
     /// <https://youtrack.jetbrains.com/issue/PY-59438/Jupyter-notebooks-created-with-PyCharm-are-missing-the-id-field-in-cells-in-the-.ipynb-json>
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub id: Option<String>,
     /// Cell-level metadata.
     pub metadata: Value,
@@ -150,6 +152,7 @@ pub struct CodeCell {
     /// Technically, id isn't required (it's not even present) in schema v4.0 through v4.4, but
     /// it's required in v4.5. Main issue is that pycharm creates notebooks without an id
     /// <https://youtrack.jetbrains.com/issue/PY-59438/Jupyter-notebooks-created-with-PyCharm-are-missing-the-id-field-in-cells-in-the-.ipynb-json>
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub id: Option<String>,
     /// Cell-level metadata.
     pub metadata: Value,

@harupy harupy changed the title Ruff inserts "id": null to notebook cells which breaks the GitHub notebook viewer Ruff inserts "id": null to notebook cells which breaks GitHub notebook viewer Aug 24, 2023
@harupy harupy changed the title Ruff inserts "id": null to notebook cells which breaks GitHub notebook viewer Ruff inserts "id": null to notebook cells and breaks GitHub notebook viewer Aug 24, 2023
@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

Found this issue while I was working on mlflow/mlflow#9445

@charliermarsh
Copy link
Member

@konstin - Do you know?

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

nbconvert fails to convert a notebook with id=null.

> jupyter nbconvert --to html a.ipynb
[NbConvertApp] Converting notebook a.ipynb to html
[NbConvertApp] ERROR | Notebook JSON is invalid: None is not of type 'string'

Failed validating 'type' in code_cell['properties']['id']:

On instance['cells'][0]['id']:
None
[NbConvertApp] ERROR | Notebook is invalid after preprocessor <nbconvert.preprocessors.tagremove.TagRemovePreprocessor object at 0x7f4bb237bf40>
Traceback (most recent call last):
  File "/home/haru/miniconda3/envs/ruff/bin/jupyter-nbconvert", line 8, in <module>
    sys.exit(main())
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/jupyter_core/application.py", line 285, in launch_instance
    return super().launch_instance(argv=argv, **kwargs)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/traitlets/config/application.py", line 1043, in launch_instance
    app.start()
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 410, in start
    self.convert_notebooks()
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 585, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 551, in convert_single_notebook
    output, resources = self.export_single_notebook(
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 477, in export_single_notebook
    output, resources = self.exporter.from_filename(
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/templateexporter.py", line 389, in from_filename
    return super().from_filename(filename, resources, **kw)  # type:ignore
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 201, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/templateexporter.py", line 395, in from_file
    return super().from_file(file_stream, resources, **kw)  # type:ignore
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 220, in from_file
    return self.from_notebook_node(
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/html.py", line 259, in from_notebook_node
    html, resources = super().from_notebook_node(nb, resources, **kw)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/templateexporter.py", line 411, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 154, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 354, in _preprocess
    self._validate_preprocessor(nbc, preprocessor)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 321, in _validate_preprocessor
    nbformat.validate(nbc, relax_add_props=True)
  File "/home/haru/miniconda3/envs/ruff/lib/python3.10/site-packages/nbformat/validator.py", line 502, in validate
    raise error
nbformat.validator.NotebookValidationError: None is not of type 'string'

Failed validating 'type' in code_cell['properties']['id']:

On instance['cells'][0]['id']:
None

@MichaReiser MichaReiser added the bug Something isn't working label Aug 24, 2023
@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

Can I fix this (if this is really a bug)?

@MichaReiser
Copy link
Member

Sure. I would be interested in @konstin's input because he has experience roundtripping JSON.

@konstin
Copy link
Member

konstin commented Aug 24, 2023

That's an interesting example because according to the spec (html, RFC, schema), id is required field in every cell in a jupyter notebook 4.5+, i.e. the notebook posted above is technically invalid. From JEP 62: required field):

"The id field in cells would always be required for any future nbformat versions (4.5+). In contrast to an optional field, the required field avoids applications having to conditionally check if an id is present or not."

The notebook still works because tools want to be backwards compatible with the <4.5 format and generally don't validate that their input is valid wrt to the specified version. I believe we shouldn't emit invalid notebooks and instead generate UUIDs (or apply one of the other suggested strategies) if we see a notebook version ≥4.5. We should still use #[serde(skip_serializing_if = "Option::is_none")] for <4.5 notebooks which does not allow id. From JEP 62: Questions:

  1. How should loaders handle notebook loading errors?

On notebook load, if an older format update and fill in ids. If an invalid id format for a 4.5+ file, then raise a validation error like we do for other schema errors. We could auto-correct for bad ids if that's deemed appropriate.

This case i think would be only applicable if we want to increase the nbformat version on write (i think we don't want to do that):

  1. So if nbformat >= 4.5 loads in a pre 4.5 notebook, then a cell ID would be generated and added to each cell?

Yes.

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

@konstin Thanks for the comment. This notebook was created 4 years ago. Maybe that's the reason why cell ids are missing.

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

Can ruff just fix code, and not touch cell ids?

@konstin
Copy link
Member

konstin commented Aug 24, 2023

This notebook was created 4 years ago. Maybe that's the reason why cell ids are missing.

It's definitely really easy to miss as a tool author! (i mean, we also missed that)

Can ruff just fix code, and not touch cell ids?

Not really, i'm afraid. If ruff were to write a notebook with version 4.5 and no cell ids, ruff would produce an invalid notebook with respect to the schema, which we shouldn't do. Downgrading the version on writing is also not an option because the notebook might use other 4.5+ features. The JEP is pretty clear that you just create some ids (with a preference towards UUIDs), so i'd do that.

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

The notebook was created with nbformat 4.2 in Python 2:

 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2

https://raw.githubusercontent.com/harupy/mlflow/3f1650db853d2c61ac0cdb0035be91b10f131859/examples/h2o/random_forest.ipynb

@konstin
Copy link
Member

konstin commented Aug 24, 2023

Sorry, i forgot to check the actual notebook and only looked at the cat a.ipynb. For the case of random_forest.ipynb, yes, we should definitely use #[serde(skip_serializing_if = "Option::is_none")] and omit adding id fields.

@harupy
Copy link
Contributor Author

harupy commented Aug 24, 2023

Filed #6851

konstin pushed a commit that referenced this issue Aug 24, 2023
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

<!-- What's the purpose of the change? What does it do, and why? -->

Fix #6834 

## Test Plan

<!-- How was it tested? -->

Need tests?

---------

Co-authored-by: Dhruv Manilawala <[email protected]>
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 24, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 25, 2023
**Summary** See #6834 (comment)

**Test Plan** Added a new notebook
konstin added a commit that referenced this issue Aug 25, 2023
**Summary** See
#6834 (comment)

**Test Plan** Added a new notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants