-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-44397: Run all notebooks in repo directory #347
Conversation
589e8ee
to
c870edb
Compare
86843a2
to
8b63d26
Compare
lsst-sqre/mobu#347 changes the Mobu notebook runner business to run all notebooks in all subdirs by default. This PR prevents any changes in behavior to existing flocks.
f240142
to
047d465
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, thank you! Just a couple of notes and suggestions.
src/mobu/constants.py
Outdated
NOTEBOOK_ALWAYS_EXCLUDE_DIRS = {"mobu_exclude"} | ||
"""NotebookRunner always excludes notebooks in these dirs.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little magical and hard-coded to me. I think I would lean towards not excluding any directories by default and making people add an excluded directory if they want one. But maybe I'm missing a use case for always having a default excluded directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it seems magical. There are a couple of issues:
- One of the things we want is for folks to be able to manage these excluded notebooks without having to make phalanx PRs. IIUC, all of the flock config is defined in phalanx, so folks would have to make phalanx PRs as they add and remove excluded dirs (or make one phalanx MR to get their own single exclude dir like in this magic default)
- A magic default exclude dir gets us out of having to tightly coordinate this mobu release with notebook repo changes. With this, we can modify the notebook repos any time before the mobu release to any environment without having to coordinate adding the
exclude_dirs
flock option at the same time as we release to each environment.- To address this, we could have this behavior be opt-in by adding a
run_in_subdirs
option that defaults to False. Then we can deploy the mobu change without affecting current behavior. We could then change the default at some point in the future when all of the flocks have their excluded dirs specified appropriately.
- To address this, we could have this behavior be opt-in by adding a
One bigger change we could make to address both of these issues is to have this behavior specified in a config file or something in the notebook repos themselves. Then:
- We could first add that config file to the repos before deploying the mobu changes and current behavior would be kept everywhere
- No phalanx changes would be needed to manage these exclude dirs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ditched the magic dir. Seems not worth the magic and confusion. I add the subdirs in the currently existing flocks in the phalanx PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think I misunderstood the way the release process will work. The mobu version and the flock config will get synced together for each environment; there won't be a state where mobu is updated but the flock config isn't.
parents = set(notebook.parents) | ||
excluded_parents = parents & self._exclude_paths | ||
return len(excluded_parents) != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parents = set(notebook.parents) | |
excluded_parents = parents & self._exclude_paths | |
return len(excluded_parents) != 0 | |
return bool(set(notebook.parents) & self._exclude_paths) |
"notebook": "test-notebook.ipynb", | ||
"notebook": ANY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've now lost this test, in that nothing is now testing that the notebook
data is set correctly to the executing notebook. It's some duplication, but it might be a bit cleaner to add a separate test for the recursive notebook running that uses a separate directory.
@@ -59,7 +59,7 @@ async def test_run( | |||
"options": { | |||
"spawn_settle_time": 0, | |||
"execution_idle_time": 0, | |||
"max_executions": 1, | |||
"max_executions": 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a stronger test if max_executions
exactly matched the number of notebooks that should be run, since then you can also test for off-by-one errors (make sure the first notebook isn't run twice, for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my change makes that test weaker, but I also wanted to ensure that no notebooks from excluded dirs would be run. I changed max_execution
back to the exact number of notebooks, but I also moved the "cycle done" message from the beginning of the next cycle, to the end of execute_code
, and I'm asserting that's logged.
That gets us the best of both worlds, right?
# Set up git client | ||
git = Git(repo=repo_path) | ||
await git.init("--initial-branch=main") | ||
await git.config("user.email", "[email protected]") | ||
await git.config("user.name", "Git User") | ||
for path in repo_path.iterdir(): | ||
if not path.name.startswith("."): | ||
await git.add(str(path)) | ||
await git.commit("-m", "Initial commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are now a couple copies of this code, it's probably time to move it into a helper function.
"options": { | ||
"spawn_settle_time": 0, | ||
"execution_idle_time": 0, | ||
"max_executions": 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note here about max_executions
.
fac6558
to
8e3f112
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Just a couple of minor nits on the change log entry.
|
||
### Backwards-incompatible changes | ||
|
||
- NotebookRunner business now runs all notebooks in a repo by default except in the `mobu_exclude` or its descendants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You dropped the mobu_exclude
part. We've also been putting periods at the end of change log entries (sort of an arbitrary style thing).
### Other changes | ||
|
||
- Updated python dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't note this in the change log because we do it with every release, so it just adds noise. Instead, we document at the start of CHANGELOG.md that we do this with every release.
Remove magic notebook exclude dir
f006b15
to
26a6535
Compare
mobu_exclude
or its descendantsexclude_dirs
option to NotebookRunner business to listadditional directories in which notebooks will NOT be run
Before this goes out, we'll need to update the tutorial notebooks and the system-test notebook repos to put all of the subdir notebooks in the
mobu_exclude
dir.