[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

antonymilne · 2022-02-21T17:32:10Z

(transfer from Jira, created by @lorenabalan)

This may be acceptable behaviour, but we should document it better in that case.

Working on this PR (see discussion in the thread too), I discovered that in the new session model, users can't set their own custom run_id, or use a different way to generate a save version (for example use a different format). They can modify KedroContext._get_save_version() or KedroContext._get_run_id() but that's not what will be stored in the session store - instead it will contain only the values equivalent to session_id.

When a session is created, a session_id is generated (timestamp) and written to the store. During a session run, that session_id is loaded from the store and is used for both save_version and run_id, and it'll be the same timestamp every time, i.e.

1 session ↔️ 1 run ↔️ 1 run_id=session_id=save_version

If that's the case, as Antony pointed out, what's the optimal behaviour in this case:

 with KedroSession.create(...) as session:
    session.run(pipeline1)
    session.run(pipeline2)

Note this isn't as strange a thing to do as it might initially seem. In Jupyter, a user could well do multiple session.run.

This ticket includes both the design and implementing the solution. Setup a team discussion to go through design suggestion.

Following the discussion the team had about this issue these tasks need to be done:

After 0.18.0 has been released:

Fix the assumption in Kedro Experiment Tracking that the user is in a CLI workflow

The text was updated successfully, but these errors were encountered:

antonymilne · 2022-02-21T21:35:34Z

This comment in Kedro-Viz makes me think that @limdauto was anticipating that run_id should not necessarily be the same as the timestamp?

Also from this Discord conversation with @shaunc on possible integration with DVC, my gut feeling is that saying run_id = session_id = save_version is maybe too restrictive and we should allow for controlling some (all?) of these independently.

shaunc · 2022-02-22T05:48:03Z

I'm integrating kedro with DVC experiment tracking kedro-dvc (integration plans) -- I'm hoping that the kedro interface will support different experiment tracking and session management plugins.

Its great that you support SESSION_STORE_CLASS -- though I'm hoping for more detail on its interface! :)

VS ids -- I'd propose that the session store "rows" (metadata) include various fields, whose names and meanings can also be configurable. For instance:

SESSION_ID_FIELD - field guaranteed to be unique over session store
SESSION_TIMESTAMP_FIELD and/or SESSION_ORDER_BY_FIELD -- the first for a timestamp, the last for display order in kedro-vis, and for finding the most recent, and maybe for deleting old during garbage collection.

DVC uses git commit hashes for names. A timestamp isn't necessarily unique in distributed runs. However, you could have default config for these things all pointing to the timestamp field you are already using for convenience.

The other thing I'd like to see made abstract is RunsRepository. Is this also going to migrate to kedro core? May I suggest that both this and the default session metadata store be moved to -- say -- kedro-session plugin, which is included by default by the core, but, being a plugin, could be superseded by someone who, perhaps, wanted to use kedro-dvc instead? :)

merelcht · 2022-03-09T14:16:14Z

Based on the discussion the Kedro team had on this topic on Monday it was decided that a session can only every have 1 run, and so the run_id is no longer needed. For the time being, the session_id and save_version will remain the same, but there is a possibility to allow users to add a custom save_version that is different from the session_id. We'll require user research to determine the best solution for allowing this customisation.

See more details in #1335

Signed-off-by: Laurens Vijnck <[email protected]>

merelcht · 2022-04-25T13:46:52Z

All tasks have been completed.

lvijnck pushed a commit to lvijnck/kedro that referenced this issue Feb 27, 2022

[KED-2630] Strip out versioning notes from docs (kedro-org#1273)

67db8f6

merelcht self-assigned this Feb 28, 2022

merelcht added this to the 0.18.0 milestone Feb 28, 2022

This was referenced Mar 7, 2022

session == run and so this should be enforced within the code #1313

Closed

Document how users can continue a run/run a partial pipeline #1314

Closed

antonymilne mentioned this issue Mar 8, 2022

[BE] Clean up experiment tracking viz code after session == run decision kedro-org/kedro-viz#764

Closed

1 task

This was referenced Mar 8, 2022

Enforce 1 session = 1 run #1329

Merged

Technical design decision record for KedroSession #1335

Closed

merelcht added the Component: Framework Issue/PR that addresses core framework functionality label Mar 15, 2022

lvijnck pushed a commit to lvijnck/kedro that referenced this issue Apr 7, 2022

[KED-2630] Strip out versioning notes from docs (kedro-org#1273)

84a9bd9

Signed-off-by: Laurens Vijnck <[email protected]>

merelcht closed this as completed Apr 25, 2022

antonymilne mentioned this issue Jun 29, 2022

Should we allow users provide custom session_id #1551

Closed

Galileo-Galilei mentioned this issue Jan 21, 2024

Universal Kedro Deployment (Part 4) - Embedding kedro pipelines in third-party applications #3540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

antonymilne commented Feb 21, 2022 •

edited by merelcht

Loading

antonymilne commented Feb 21, 2022

shaunc commented Feb 22, 2022 •

edited

Loading

merelcht commented Mar 9, 2022

merelcht commented Apr 25, 2022

[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

Comments

antonymilne commented Feb 21, 2022 • edited by merelcht Loading

antonymilne commented Feb 21, 2022

shaunc commented Feb 22, 2022 • edited Loading

merelcht commented Mar 9, 2022

merelcht commented Apr 25, 2022

antonymilne commented Feb 21, 2022 •

edited by merelcht

Loading

shaunc commented Feb 22, 2022 •

edited

Loading