Enforce `1 session = 1 run` #1329

merelcht · 2022-03-08T14:49:27Z

Description

After the discussion about #1273, it has been decided that a session can only ever have one run. Currently, we've got the concept of a run_id in the codebase, which is mainly a remnant from the journal and doesn't serve a purpose. I will write up a summary of the whole discussion, but baseline is that a session can only have 1 run, and so the session_id and run_id will always be the same.

Development notes

Removed run_id where it didn't serve a purpose.
Changed the run_params pipeline hook specs to refer to session_id instead of run_id.
Enforce/inform users that they shouldn't run multiple runs during 1 session: 55fb343

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes

Signed-off-by: Merel Theisen <[email protected]>

…to session-is-run

antonymilne · 2022-03-10T13:45:43Z

docs/source/extend_kedro/hooks.md

@@ -411,7 +406,7 @@ class DataValidationHooks:
                expectation_suite,
            )
            expectation_context.run_validation_operator(
-                "action_list_operator", assets_to_validate=[batch], run_id=run_id
+                "action_list_operator", assets_to_validate=[batch]


I was wondering whether GE would still work without the run_id argument here. It appears to be an optional argument but I think it might be quite useful for users to keep track of validations done over multiple runs.

antonymilne

Are we sure that we want to remove run_id from after_catalog_created rather than replacing it with session_id? 🤔 Or is there just not an easy way to access session_id at the point we would need it?

merelcht · 2022-03-10T14:27:23Z

@AntonyMilneQB I think both of your points are valid and so perhaps instead of removing the run_id it makes more sense to just replace it with session_id.

lorenabalan

I love that we're getting rid of all the run_id passed around everywhere, code feels so much more breathable now!!
I'm ambivalent about the error on re-run, I have no strong opinion for or against it. On the one hand it does the job, on the other hand I worry that it's gonna end up throwaway code a bit further down the line. I have an open question about what should happen in case the pipeline run fails and maybe that'll clarify things in my head.

tests/framework/session/test_session_extension_hooks.py

lorenabalan · 2022-03-10T15:05:13Z

kedro/framework/session/session.py

@@ -379,7 +398,8 @@ def run(  # pylint: disable=too-many-arguments,too-many-locals
        )

        try:
-            run_result = runner.run(filtered_pipeline, catalog, hook_manager, run_id)
+            run_result = runner.run(filtered_pipeline, catalog, hook_manager)
+            self._run_called = True


So this is just in case the run is successful. If there's a problem with the pipeline halfway through, we can call session.run() again. Is that the intended course? Makes sense for debugging, if a little confusing for the user (but I've run it multiple times to failure, it allowed me to that then!).

Yes, so my thinking was that if you're in an interactive workflow to experiment/debug your pipeline it would be overhead if you need to recreate a new session if the pipeline didn't run successfully. At the same time this is a niche case just for those people that would actually do a run inside a notebook/ipython session.

kedro/framework/session/session.py

tests/framework/session/test_session.py

Co-authored-by: Lorena Bălan <[email protected]>

Signed-off-by: Merel Theisen <[email protected]>

idanov · 2022-03-10T16:36:14Z

I wonder whether it's worth keeping the concept of a run_id, but only setting it to the session_id, in order to have a much less distruptive change in APIs and also concepts. Would that make it compatible with Kedro Viz without the need for a new release?

Also a lot of hooks here were receiving the run_id and now no longer will receive it, but the users might still need to know the run_id / session_id at that particular hook. Have you checked with users if they need the run_id (now session_id) in those hooks or that's no longer needed?

kedro/framework/session/session.py

merelcht · 2022-03-10T16:47:15Z

I wonder whether it's worth keeping the concept of a run_id, but only setting it to the session_id, in order to have a much less distruptive change in APIs and also concepts. Would that make it compatible with Kedro Viz without the need for a new release?

I am against keeping the concept of the run_id, because I feel that it would just cause confusion and we'd end up having discussions in the future again about what the difference is between run_id and session_id

Also a lot of hooks here were receiving the run_id and now no longer will receive it, but the users might still need to know the run_id / session_id at that particular hook. Have you checked with users if they need the run_id (now session_id) in those hooks or that's no longer needed?

However, I do hear this point and Antony was saying something similar. I think I got too excited removing the run_id altogether from the hook specs, because in our codebase it doesn't really serve a purpose but you're right in that it might be used by users for reasons unknown to me. I will change this so that the specs will now have the session_id.

Signed-off-by: Merel Theisen <[email protected]>

antonymilne

Nice work! I'm actually now not so sure about the hook specs after I realised we already have save_version there, sorry 😬 No strong feelings though - up to you.

kedro/framework/session/session.py

antonymilne · 2022-03-11T10:21:48Z

kedro/framework/context/context.py

@@ -290,7 +291,7 @@ def _get_catalog(
            feed_dict=feed_dict,
            save_version=save_version,


Can save_version ever be different from session_id now actually? 🤔 On seconds thoughts maybe we don't need to keep this in the hook spec after all if it's just a duplicate of save_version?

At the moment they will always be the same, but this goes back to the question whether we should allow users to have a custom save_version, which requires user research. With the mindset of "don't add things that could potentially be used in the future", I'll remove it for now.

Co-authored-by: Antony Milne <[email protected]>

Signed-off-by: Merel Theisen <[email protected]>

lorenabalan

I was a huge fan of getting rid of passing the id around everywhere. :( I seriously doubt it was a heavily used feature by any user, I'm worried we're leaving around code that's only hypothetical and probably 90% dead. Having said that, it's not a hill I'm willing to die on anytime soon, so happy to let it go for now.
We should add something in the release notes though that we are replacing run_id with sth else in the hook specs, and that users should update their code accordingly (1 short migration guide note).

merelcht · 2022-03-14T16:16:16Z

I was a huge fan of getting rid of passing the id around everywhere. :( I seriously doubt it was a heavily used feature by any user, I'm worried we're leaving around code that's only hypothetical and probably 90% dead. Having said that, it's not a hill I'm willing to die on anytime soon, so happy to let it go for now.

I hear you, I would've preferred to remove it as well, but I think there's an argument around being able to find back the run through the id when using external monitoring/orchestration systems. Anyway, I think this needs user research before we can make a final decision on removing it.

We should add something in the release notes though that we are replacing run_id with sth else in the hook specs, and that users should update their code accordingly (1 short migration guide note).

Yes I'll add this!

Signed-off-by: Merel Theisen <[email protected]>

merelcht added 3 commits March 7, 2022 16:46

Remove run_id from the code

7a6e6fa

Signed-off-by: Merel Theisen <[email protected]>

Remove run_id from docs, but do return session_id in pipeline hooks

a4b2148

Signed-off-by: Merel Theisen <[email protected]>

Fix test

697ee3f

Signed-off-by: Merel Theisen <[email protected]>

merelcht requested review from yetudada and idanov as code owners March 8, 2022 14:49

Merge branch 'develop' into session-is-run

1af6cce

merelcht self-assigned this Mar 8, 2022

merelcht linked an issue Mar 8, 2022 that may be closed by this pull request

session == run and so this should be enforced within the code #1313

Closed

merelcht added this to the 0.18.0 milestone Mar 8, 2022

merelcht mentioned this pull request Mar 8, 2022

Add runner class name to pipeline hooks + introduce after_command_run CLI hook spec #1309

Merged

5 tasks

merelcht and others added 3 commits March 9, 2022 11:14

Merge branch 'develop' into session-is-run

fce25c6

Raise exception when more than 1 run executed within the same session

55fb343

Signed-off-by: Merel Theisen <[email protected]>

Merge branch 'session-is-run' of github.com:quantumblacklabs/kedro in…

b30faa3

…to session-is-run

merelcht changed the title ~~Remove the run_id after it was agreed that 1 session = 1 run~~ Enforce 1 session = 1 run Mar 9, 2022

merelcht requested review from lorenabalan, antonymilne, AhdraMeraliQB and SajidAlamQB March 9, 2022 12:05

merelcht linked an issue Mar 9, 2022 that may be closed by this pull request

Technical design decision record for KedroSession #1335

Closed

merelcht mentioned this pull request Mar 9, 2022

Technical design decision record for KedroSession #1335

Closed

antonymilne reviewed Mar 10, 2022

View reviewed changes

lorenabalan reviewed Mar 10, 2022

View reviewed changes

merelcht and others added 2 commits March 10, 2022 17:05

Apply suggestions from code review

4952dfb

Co-authored-by: Lorena Bălan <[email protected]>

Add test for re-running a broken pipeline within the same session

4142059

Signed-off-by: Merel Theisen <[email protected]>

idanov reviewed Mar 10, 2022

View reviewed changes

kedro/framework/session/session.py Outdated Show resolved Hide resolved

Replace run_id by session_id instead of removing it

f71a697

Signed-off-by: Merel Theisen <[email protected]>

merelcht requested review from idanov, antonymilne and lorenabalan March 10, 2022 18:02

merelcht and others added 2 commits March 10, 2022 18:02

Apply review suggestion

ae9db52

Signed-off-by: Merel Theisen <[email protected]>

Merge branch 'develop' into session-is-run

e42f9ac

antonymilne approved these changes Mar 11, 2022

View reviewed changes

merelcht and others added 2 commits March 11, 2022 12:16

Update kedro/framework/session/session.py

86bddc9

Co-authored-by: Antony Milne <[email protected]>

Remove sesion_id from after_catalog_created hook spec

7b87918

Signed-off-by: Merel Theisen <[email protected]>

lorenabalan approved these changes Mar 14, 2022

View reviewed changes

merelcht added 2 commits March 14, 2022 16:29

Update release notes

00168c8

Signed-off-by: Merel Theisen <[email protected]>

Fix lint

7abd4cd

Signed-off-by: Merel Theisen <[email protected]>

merelcht merged commit 5f3a5bb into develop Mar 14, 2022

merelcht deleted the session-is-run branch March 14, 2022 17:17

merelcht mentioned this pull request Mar 14, 2022

session == run and so this should be enforced within the code #1313

Closed

noklam mentioned this pull request Aug 1, 2023

Provide a lightweight solution to speed up session reload or create new session #2879

Open

Galileo-Galilei mentioned this pull request Jan 21, 2024

Universal Kedro Deployment (Part 4) - Embedding kedro pipelines in third-party applications #3540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce `1 session = 1 run` #1329

Enforce `1 session = 1 run` #1329

merelcht commented Mar 8, 2022 •

edited

Loading

antonymilne Mar 10, 2022

antonymilne left a comment

merelcht commented Mar 10, 2022

lorenabalan left a comment

lorenabalan Mar 10, 2022

merelcht Mar 10, 2022

idanov commented Mar 10, 2022

merelcht commented Mar 10, 2022

antonymilne left a comment

antonymilne Mar 11, 2022

merelcht Mar 11, 2022

lorenabalan left a comment

merelcht commented Mar 14, 2022

		@@ -290,7 +291,7 @@ def _get_catalog(
		feed_dict=feed_dict,
		save_version=save_version,

Enforce 1 session = 1 run #1329

Enforce 1 session = 1 run #1329

Conversation

merelcht commented Mar 8, 2022 • edited Loading

Description

Development notes

Checklist

antonymilne Mar 10, 2022

Choose a reason for hiding this comment

antonymilne left a comment

Choose a reason for hiding this comment

merelcht commented Mar 10, 2022

lorenabalan left a comment

Choose a reason for hiding this comment

lorenabalan Mar 10, 2022

Choose a reason for hiding this comment

merelcht Mar 10, 2022

Choose a reason for hiding this comment

idanov commented Mar 10, 2022

merelcht commented Mar 10, 2022

antonymilne left a comment

Choose a reason for hiding this comment

antonymilne Mar 11, 2022

Choose a reason for hiding this comment

merelcht Mar 11, 2022

Choose a reason for hiding this comment

lorenabalan left a comment

Choose a reason for hiding this comment

merelcht commented Mar 14, 2022

Enforce `1 session = 1 run` #1329

Enforce `1 session = 1 run` #1329

merelcht commented Mar 8, 2022 •

edited

Loading