Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the loaded_version and saved_version for AbstractVersionedDataSet #1951

Closed
wants to merge 3 commits into from

Conversation

noklam
Copy link
Contributor

@noklam noklam commented Oct 18, 2022

Signed-off-by: Nok Chan [email protected]

Description

Related #1580

How current load version is determined when version=None?

Currently, this load_version information is buried deep down in the framework, and it is determined only when a dataset is loaded at runtime.
The details of how "latest" version is in a method called resolve_load_version, which further calls _fetch_latest_load_version

kedro/kedro/io/core.py

Lines 558 to 564 in b2e59fa

def _get_load_path(self) -> PurePosixPath:
if not self._version:
# When versioning is disabled, load from original filepath
return self._filepath
load_version = self.resolve_load_version()
return self._get_versioned_path(load_version) # type: ignore

How version is determined?

  def resolve_load_version(self) -> Optional[str]:
      """Compute the version the dataset should be loaded with."""
      if not self._version:  # Not version at all - not sure when will this path executed?
          return None 
      if self._version.load:  # when load-version is specified in CLI or config
          return self._version.load
      return self._fetch_latest_load_version().  # When version=None

Development notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Alternatives

1st attempt - Add new attributes in AbstractVersionedDataSet

It may not be a bad idea to add new attributes at all - we can update the load_version save_version or the version_cache, but that way we also lost the initial input which makes things complicated.

8fe0b1a
image

More Context

Currently every call to AbstractVersionedDataSet.load() resolves the load version once again. This may result in inconsistent versioned being passed to different nodes (if new version is created mid-run).

Signed-off-by: Nok Chan <[email protected]>
@noklam noklam requested a review from idanov as a code owner October 18, 2022 14:15
@noklam noklam removed the request for review from idanov October 19, 2022 12:32
@noklam noklam marked this pull request as draft October 19, 2022 12:33
@noklam noklam changed the title Refactor Load version logic - expose the latest load information Expose the loaded_version and saved_version for AbstractVersionedDataSet Oct 20, 2022
@noklam
Copy link
Contributor Author

noklam commented Feb 8, 2023

Closed in favor of #1654 as the pre-requisite and we should come back to think about how to expose these information.

@noklam noklam closed this Feb 8, 2023
@merelcht merelcht deleted the feat/expose-latest-load-version branch May 4, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant