-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Versioning]: Explore Kedro + DVC for versioning #4239
Comments
If I am not wrong, what we are asking for here is support in
Ref: https://dvc.org/doc/use-cases/versioning-data-and-models/tutorial#automating-capturing |
Kedro + DVC integration🔗 REPO LINK: https://github.com/ankatiyar/space-dvc I tried to integrate DVC into my Spaceflights Kedro project to check the extent of the versioning capabilities. The steps I followed are:
Versioning data with
|
This is cool! Would it make sense to automate some of these commands with a pre/post hooks? also wdyt think about some commands to generate/sync the |
It's a good idea to automate some major steps to encapsulate DVC command calls from users, so they work with kedro as usual. We were thinking of making it via plugins so the DVC dependency is not mandatory and one can switch between different versioning solutions (DVC, Iceberg, etc.). In our case plugins can simply extend basic hooks as you suggest. |
To answer some of the questions mentioned on the ticket description:
DVC does solve most of the versioning related issues arising from the research. With the current state of things, it's possible to:
Challenges:
Pretty easy and their documentation is great!
Explained in the comment above.
Not really 🤔
DVC doesn't care about the data formats, so all data formats.
Fairly simple, however, like I mentioned, we would likely be constrained by the remote storages supported by DVC.
Dependency tree for DVC
|
I guess we can close this now the docs have been written right? @astrojuanlu |
Yes! |
Description
At the current stage by versioning we assume mapping a single version number to the corresponding versions of parameters, I/O data, and code. So one is able to retrieve a full project state including data at any point in time.
The goal is to check if we can use DVC to map a single version number to code, parameters, and I/O data within Kedro and how it aligns with Kedro’s workflow.
As a result, we expect a working example of kedro project used with DVC for versioning and some assumptions on:
Context
#4199
Relates to: #2691 (comment)
Market research
The text was updated successfully, but these errors were encountered: