-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce 1 session = 1 run
#1329
Conversation
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
run_id
after it was agreed that 1 session = 1 run
1 session = 1 run
docs/source/extend_kedro/hooks.md
Outdated
@@ -411,7 +406,7 @@ class DataValidationHooks: | |||
expectation_suite, | |||
) | |||
expectation_context.run_validation_operator( | |||
"action_list_operator", assets_to_validate=[batch], run_id=run_id | |||
"action_list_operator", assets_to_validate=[batch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether GE would still work without the run_id
argument here. It appears to be an optional argument but I think it might be quite useful for users to keep track of validations done over multiple runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that we want to remove run_id
from after_catalog_created
rather than replacing it with session_id
? 🤔 Or is there just not an easy way to access session_id
at the point we would need it?
@AntonyMilneQB I think both of your points are valid and so perhaps instead of removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that we're getting rid of all the run_id
passed around everywhere, code feels so much more breathable now!!
I'm ambivalent about the error on re-run, I have no strong opinion for or against it. On the one hand it does the job, on the other hand I worry that it's gonna end up throwaway code a bit further down the line. I have an open question about what should happen in case the pipeline run fails and maybe that'll clarify things in my head.
@@ -379,7 +398,8 @@ def run( # pylint: disable=too-many-arguments,too-many-locals | |||
) | |||
|
|||
try: | |||
run_result = runner.run(filtered_pipeline, catalog, hook_manager, run_id) | |||
run_result = runner.run(filtered_pipeline, catalog, hook_manager) | |||
self._run_called = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is just in case the run is successful. If there's a problem with the pipeline halfway through, we can call session.run()
again. Is that the intended course? Makes sense for debugging, if a little confusing for the user (but I've run it multiple times to failure, it allowed me to that then!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so my thinking was that if you're in an interactive workflow to experiment/debug your pipeline it would be overhead if you need to recreate a new session
if the pipeline didn't run successfully. At the same time this is a niche case just for those people that would actually do a run
inside a notebook/ipython session.
Co-authored-by: Lorena Bălan <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
I wonder whether it's worth keeping the concept of a Also a lot of hooks here were receiving the |
I am against keeping the concept of the
However, I do hear this point and Antony was saying something similar. I think I got too excited removing the |
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm actually now not so sure about the hook specs after I realised we already have save_version
there, sorry 😬 No strong feelings though - up to you.
@@ -290,7 +291,7 @@ def _get_catalog( | |||
feed_dict=feed_dict, | |||
save_version=save_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can save_version
ever be different from session_id
now actually? 🤔 On seconds thoughts maybe we don't need to keep this in the hook spec after all if it's just a duplicate of save_version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment they will always be the same, but this goes back to the question whether we should allow users to have a custom save_version
, which requires user research. With the mindset of "don't add things that could potentially be used in the future", I'll remove it for now.
Co-authored-by: Antony Milne <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a huge fan of getting rid of passing the id around everywhere. :( I seriously doubt it was a heavily used feature by any user, I'm worried we're leaving around code that's only hypothetical and probably 90% dead. Having said that, it's not a hill I'm willing to die on anytime soon, so happy to let it go for now.
We should add something in the release notes though that we are replacing run_id
with sth else in the hook specs, and that users should update their code accordingly (1 short migration guide note).
I hear you, I would've preferred to remove it as well, but I think there's an argument around being able to find back the run through the id when using external monitoring/orchestration systems. Anyway, I think this needs user research before we can make a final decision on removing it.
Yes I'll add this! |
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Description
After the discussion about #1273, it has been decided that a session can only ever have one run. Currently, we've got the concept of a
run_id
in the codebase, which is mainly a remnant from thejournal
and doesn't serve a purpose. I will write up a summary of the whole discussion, but baseline is that a session can only have 1 run, and so thesession_id
andrun_id
will always be the same.Development notes
run_id
where it didn't serve a purpose.run_params
pipeline hook specs to refer tosession_id
instead ofrun_id
.Checklist
RELEASE.md
file