Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-44397: Run all notebooks in repo directory #347

Merged
merged 3 commits into from
May 21, 2024

Conversation

fajpunk
Copy link
Member

@fajpunk fajpunk commented May 19, 2024

  • NotebookRunner business now runs all notebooks in a repo by default
  • Except in the mobu_exclude or its descendants
  • Add exclude_dirs option to NotebookRunner business to list
    additional directories in which notebooks will NOT be run
  • Phalanx mobu docs updated here.

Before this goes out, we'll need to update the tutorial notebooks and the system-test notebook repos to put all of the subdir notebooks in the mobu_exclude dir.

@fajpunk fajpunk force-pushed the tickets/DM-44397/notebook-directories branch 2 times, most recently from 589e8ee to c870edb Compare May 19, 2024 20:18
@fajpunk fajpunk force-pushed the tickets/DM-44397/notebook-directories branch 2 times, most recently from 86843a2 to 8b63d26 Compare May 20, 2024 01:08
fajpunk added a commit to lsst-sqre/phalanx that referenced this pull request May 20, 2024
lsst-sqre/mobu#347 changes the Mobu notebook
runner business to run all notebooks in all subdirs by default.
This PR prevents any changes in behavior to existing flocks.
@fajpunk fajpunk force-pushed the tickets/DM-44397/notebook-directories branch 2 times, most recently from f240142 to 047d465 Compare May 20, 2024 15:38
@fajpunk fajpunk requested review from rra and athornton and removed request for athornton May 20, 2024 15:45
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, thank you! Just a couple of notes and suggestions.

Comment on lines 16 to 18
NOTEBOOK_ALWAYS_EXCLUDE_DIRS = {"mobu_exclude"}
"""NotebookRunner always excludes notebooks in these dirs."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little magical and hard-coded to me. I think I would lean towards not excluding any directories by default and making people add an excluded directory if they want one. But maybe I'm missing a use case for always having a default excluded directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it seems magical. There are a couple of issues:

  1. One of the things we want is for folks to be able to manage these excluded notebooks without having to make phalanx PRs. IIUC, all of the flock config is defined in phalanx, so folks would have to make phalanx PRs as they add and remove excluded dirs (or make one phalanx MR to get their own single exclude dir like in this magic default)
  2. A magic default exclude dir gets us out of having to tightly coordinate this mobu release with notebook repo changes. With this, we can modify the notebook repos any time before the mobu release to any environment without having to coordinate adding the exclude_dirs flock option at the same time as we release to each environment.
    • To address this, we could have this behavior be opt-in by adding a run_in_subdirs option that defaults to False. Then we can deploy the mobu change without affecting current behavior. We could then change the default at some point in the future when all of the flocks have their excluded dirs specified appropriately.

One bigger change we could make to address both of these issues is to have this behavior specified in a config file or something in the notebook repos themselves. Then:

  • We could first add that config file to the repos before deploying the mobu changes and current behavior would be kept everywhere
  • No phalanx changes would be needed to manage these exclude dirs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ditched the magic dir. Seems not worth the magic and confusion. I add the subdirs in the currently existing flocks in the phalanx PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think I misunderstood the way the release process will work. The mobu version and the flock config will get synced together for each environment; there won't be a state where mobu is updated but the flock config isn't.

Comment on lines 98 to 100
parents = set(notebook.parents)
excluded_parents = parents & self._exclude_paths
return len(excluded_parents) != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parents = set(notebook.parents)
excluded_parents = parents & self._exclude_paths
return len(excluded_parents) != 0
return bool(set(notebook.parents) & self._exclude_paths)

Comment on lines 79 to 163
"notebook": "test-notebook.ipynb",
"notebook": ANY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've now lost this test, in that nothing is now testing that the notebook data is set correctly to the executing notebook. It's some duplication, but it might be a bit cleaner to add a separate test for the recursive notebook running that uses a separate directory.

@@ -59,7 +59,7 @@ async def test_run(
"options": {
"spawn_settle_time": 0,
"execution_idle_time": 0,
"max_executions": 1,
"max_executions": 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a stronger test if max_executions exactly matched the number of notebooks that should be run, since then you can also test for off-by-one errors (make sure the first notebook isn't run twice, for instance).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my change makes that test weaker, but I also wanted to ensure that no notebooks from excluded dirs would be run. I changed max_execution back to the exact number of notebooks, but I also moved the "cycle done" message from the beginning of the next cycle, to the end of execute_code, and I'm asserting that's logged.

That gets us the best of both worlds, right?

Comment on lines 139 to 147
# Set up git client
git = Git(repo=repo_path)
await git.init("--initial-branch=main")
await git.config("user.email", "[email protected]")
await git.config("user.name", "Git User")
for path in repo_path.iterdir():
if not path.name.startswith("."):
await git.add(str(path))
await git.commit("-m", "Initial commit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are now a couple copies of this code, it's probably time to move it into a helper function.

"options": {
"spawn_settle_time": 0,
"execution_idle_time": 0,
"max_executions": 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here about max_executions.

@fajpunk fajpunk force-pushed the tickets/DM-44397/notebook-directories branch from fac6558 to 8e3f112 Compare May 20, 2024 18:24
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just a couple of minor nits on the change log entry.


### Backwards-incompatible changes

- NotebookRunner business now runs all notebooks in a repo by default except in the `mobu_exclude` or its descendants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dropped the mobu_exclude part. We've also been putting periods at the end of change log entries (sort of an arbitrary style thing).

Comment on lines 8 to 10
### Other changes

- Updated python dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't note this in the change log because we do it with every release, so it just adds noise. Instead, we document at the start of CHANGELOG.md that we do this with every release.

Remove magic notebook exclude dir
@fajpunk fajpunk force-pushed the tickets/DM-44397/notebook-directories branch from f006b15 to 26a6535 Compare May 20, 2024 21:31
@fajpunk fajpunk merged commit dce6bc0 into main May 21, 2024
3 checks passed
@fajpunk fajpunk deleted the tickets/DM-44397/notebook-directories branch May 21, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants