Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Make catalog a standalone package #3941

Open
ElenaKhaustova opened this issue Jun 6, 2024 · 0 comments
Open

[DataCatalog]: Make catalog a standalone package #3941

ElenaKhaustova opened this issue Jun 6, 2024 · 0 comments
Labels
Type: User Research Synthesis ✍️ Issues to document results from user research

Comments

@ElenaKhaustova
Copy link
Contributor

Description

Several teams are already utilizing the catalog as a standalone component, demonstrating a demand for this functionality. Potential use cases may include collaboration between teams, sharing catalogs without the framework, and integration with other frameworks. Recognizing the existing adoption of the catalog as a standalone component highlights its potential value outside the framework's context.

We propose to explore the possibility of making DataCatalog a standalone component (moving it outside of the framework).

Relates to #3659, #3932

Context

  • Data analysts  already use datacatalog as a standalone component, they don’t even know about kedro pipelines : "Actually the the data analyst population I described at the beginning use 100% as a standalone component. We just have the Omega config loader at the beginning just to help creating the catalog. But then the users use it as a standard component and we do not introduce any other component. We have more users using the catalog as a standalone component than users using it with Kedro."
  • CSTs use DataCatalog as standalone component with Metaflow pipelines: "We were using Metaflow instead of the other software which is similar to Kedro from Netflix. They don't have catalog kind of API. So we used Kedro catalog with Metaflow back then with Hydra config loader."
  • User Demand and Manageability: There is sufficient user interest to justify making DataCatalog standalone. Managing it in a monorepo is feasible given current CI/CD setups.
  • Marketing and Adoption: Positioning DataCatalog as a separate component could serve as an entry point for users to discover and adopt the broader Kedro framework, enhancing user engagement.
  • Comparison with Similar Tools: Comparing it to other tools like Anaconda’s intake catalog, which hasn’t seen widespread adoption, suggests that while it’s not a top priority, there is a niche to fill.
  • Collaboration and Standardization: The standalone DataCatalog could improve team collaboration by standardizing how data resources are accessed and discussed, moving away from inefficient practices like sharing file paths directly.
  • Feature Expansion Potential: Adding functionality to read from and write to YAML could make DataCatalog a comprehensive, independent tool, increasing its utility and appeal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: User Research Synthesis ✍️ Issues to document results from user research
Projects
Status: No status
Development

No branches or pull requests

1 participant