Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip metadata that is causing notebook validation to fail #245

Merged
merged 5 commits into from
Jan 24, 2022

Conversation

nicholaswold
Copy link
Contributor

This adds a kwarg to validate: strip_invalid_metadata, which when set to True will introspect into schema failures during validation and will attempt to remove top-level metadata keys for the cell or notebook that are failing validation.

Specifically what this mechanism does:

  1. Converts ValidationErrors into an ErrorTree so we can traverse the schema
  2. Removes all top-level metadata keys that failed validation
  3. Iterates through all validation errors at the cell level
  4. Identifies what the intended cell_type was for that cell
  5. Removes all top-level metadata that had a validation error specific to the intended cell_type

Potential issues that might block merging this as-is:

  1. There is some slight fragility around determining the intended cell type schema schema_index = schemas_by_index.index(f"#/definitions/{intended_cell_type}_cell"). If the definitions changed their convention this would break.
  2. I validate the notebook again after an attempt at healing as a safety net for complex validation errors. We could be more efficient here but I'm not sure if that's the right call. Removing invalid metadata might cause additional validation errors if that key was required, and so on.
  3. This feature will just straight-up not work with fastjsonschema because it relies on introspection into jsonschema.ValidationErrors. We coerce fastjsonschema errors into jsonschema.ValidationErrors but those errors lack some of the context I think is necessary to traverse correctly.

I think long-term, this general strategy could be merged into #236.

nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Show resolved Hide resolved
@nicholaswold nicholaswold requested a review from ivanov January 24, 2022 18:35
@ivanov ivanov merged commit 59e927d into jupyter:master Jan 24, 2022
# didn't cause another complex validation issue in the schema.
# Also to ensure that higher-level errors produced by individual metadata validation
# failures are removed.
errors = validator.iter_errors(nbdict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is a security issue, validation is part of the security/signing, you can't mutate something you are trying to validate, it introduces risk of getting dangerous notebook matching a valid signature. Validate shoudl not mutate. Ever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants