-
Notifications
You must be signed in to change notification settings - Fork 929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273
Comments
This comment in Kedro-Viz makes me think that @limdauto was anticipating that run_id should not necessarily be the same as the timestamp? Also from this Discord conversation with @shaunc on possible integration with DVC, my gut feeling is that saying run_id = session_id = save_version is maybe too restrictive and we should allow for controlling some (all?) of these independently. |
I'm integrating kedro with DVC experiment tracking kedro-dvc (integration plans) -- I'm hoping that the kedro interface will support different experiment tracking and session management plugins. Its great that you support VS ids -- I'd propose that the session store "rows" (metadata) include various fields, whose names and meanings can also be configurable. For instance:
DVC uses git commit hashes for names. A timestamp isn't necessarily unique in distributed runs. However, you could have default config for these things all pointing to the timestamp field you are already using for convenience. The other thing I'd like to see made abstract is |
Based on the discussion the Kedro team had on this topic on Monday it was decided that a session can only every have 1 run, and so the See more details in #1335 |
Signed-off-by: Laurens Vijnck <[email protected]>
All tasks have been completed. |
(transfer from Jira, created by @lorenabalan)
This may be acceptable behaviour, but we should document it better in that case.
Working on this PR (see discussion in the thread too), I discovered that in the new session model, users can't set their own custom
run_id
, or use a different way to generate a save version (for example use a different format). They can modifyKedroContext._get_save_version()
orKedroContext._get_run_id()
but that's not what will be stored in the session store - instead it will contain only the values equivalent tosession_id
.When a session is created, a
session_id
is generated (timestamp) and written to the store. During a session run, thatsession_id
is loaded from the store and is used for bothsave_version
andrun_id
, and it'll be the same timestamp every time, i.e.1 session↔️ 1 run ↔️ 1 run_id=session_id=save_version
If that's the case, as Antony pointed out, what's the optimal behaviour in this case:
Note this isn't as strange a thing to do as it might initially seem. In Jupyter, a user could well do multiple
session.run
.This ticket includes both the design and implementing the solution. Setup a team discussion to go through design suggestion.
Following the discussion the team had about this issue these tasks need to be done:
session == run
and so this should be enforced within the code #1313After
0.18.0
has been released:The text was updated successfully, but these errors were encountered: