Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jupyter notebooks to ecosystem checks #9293

Merged
merged 4 commits into from
Jan 4, 2024
Merged

Conversation

zanieb
Copy link
Member

@zanieb zanieb commented Dec 27, 2023

Implements #8873 via #9286

@zanieb zanieb added the ci Related to internal CI tooling label Dec 27, 2023
Copy link
Contributor

github-actions bot commented Dec 27, 2023

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@zanieb
Copy link
Member Author

zanieb commented Dec 27, 2023

Are these real issues with our formatter? cc @dhruvmanila

Base automatically changed from zb/toml-updates-ecosystem to main December 28, 2023 16:44
zanieb added a commit that referenced this pull request Dec 28, 2023
Adds the ability to override `ruff.toml` or `pyproject.toml` settings
per-project during ecosystem checks.

Exploring this as a fix for the `setuptools` project error. 

Also useful for including Jupyter Notebooks in the ecosystem checks, see
#9293

Note the remaining `sphinx` project error is resolved in #9294
@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from 048068b to b0ec884 Compare December 28, 2023 16:45
@konstin
Copy link
Member

konstin commented Dec 28, 2023

I checked some of the notebooks and they did contain syntax errors.

The error messages should have a cell number though.

@zanieb
Copy link
Member Author

zanieb commented Dec 29, 2023

Yes I can confirm the huggingface notebooks are all over the place and have invalid syntax. We should improve the formatter error messages in those cases, but that's a separate concern (#9311). I've fixed the errors in zanieb/huggingface-notebooks@68cd6fa but contributing upstream seems like a pain since the notebooks are synced from other repositories and the content is full of problems.

I opened a fix for the openai-cookbook error at openai/openai-cookbook#964

@zanieb
Copy link
Member Author

zanieb commented Dec 29, 2023

Hm very interesting, do we mutate cell ids?

  "cells": [
   {
    "cell_type": "markdown",
-   "id": "3f738710-b5ef-46c4-b874-f13221ee9f76",
+   "id": "754e1af3-4775-4211-a476-d0f698c81cd4",
    "metadata": {
     "colab_type": "text",
     "id": "view-in-github"

@zanieb
Copy link
Member Author

zanieb commented Dec 30, 2023

We create IDs if they do not exist at

if id.is_none() {
// https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md#questions
*id = Some(Uuid::new_v4().to_string());
}

Presumably this just differs on each run of the Ruff formatter. Hm. I guess we could seed for determinism?

@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from bbbc437 to da96325 Compare January 2, 2024 21:26
@zanieb
Copy link
Member Author

zanieb commented Jan 2, 2024

Rebasing onto #9359

@zanieb zanieb changed the base branch from main to zb/notebook-id-fmt January 2, 2024 22:06
@zanieb zanieb force-pushed the zb/notebook-id-fmt branch from 411f988 to f4edaa6 Compare January 3, 2024 02:28
@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from da96325 to 652b3f5 Compare January 3, 2024 02:29
@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from 652b3f5 to 2c1a8c1 Compare January 3, 2024 17:10
Base automatically changed from zb/notebook-id-fmt to main January 4, 2024 15:19
zanieb added a commit that referenced this pull request Jan 4, 2024
When formatting notebooks, we populate the `id` field for cells that do
not have one. Previously, we generated a UUID v4 which resulted in
non-deterministic formatting. Here, we generate the UUID from a seeded
random number generator instead of using true randomness. For example,
here are the first five ids it would generate:

```
7fb27b94-1602-401d-9154-2211134fc71a
acae54e3-7e7d-407b-bb7b-55eff062a284
9a63283c-baf0-4dbc-ab1f-6479b197f3a8
8dd0d809-2fe7-4a7c-9628-1538738b07e2
72eea511-9410-473a-a328-ad9291626812
```

We also add a check that an id is not present in another cell to prevent
accidental introduction of duplicate ids.

The specification is lax, and we could just use incrementing integers
e.g. `0`, `1`, ... but I have a minor preference for retaining the UUID
format. Some discussion
[here](#9359 (comment))
— I'm happy to go either way though.

Discovered via #9293
@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from 2c1a8c1 to 060c443 Compare January 4, 2024 15:19
@zanieb zanieb marked this pull request as ready for review January 4, 2024 15:20
@zanieb
Copy link
Member Author

zanieb commented Jan 4, 2024

See fix for ibis #9392

@zanieb zanieb force-pushed the zb/jupyter-ecosystem branch from 060c443 to 37a7152 Compare January 4, 2024 19:29
@@ -9,6 +9,8 @@
Repository,
)

JUPYTER_NOTEBOOK_SELECT = "A,E703,F704,B015,B018,D100"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvmanila chose them — they're rules that have different behavior for notebooks or something? Idk if we really need to select a subset though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd probably just expand them, but nbd either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like extend select? I'll just wait for Dhruv to be available — not crtical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're rules that have different behavior for notebooks or something?

Yeah, I chose them for this exact reason but it's reasonable to just expand them and use extend-select if possible.

@zanieb zanieb merged commit 4b8b3a1 into main Jan 4, 2024
17 checks passed
@zanieb zanieb deleted the zb/jupyter-ecosystem branch January 4, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Related to internal CI tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants