-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional tags for DataSets and accessing through the catalog #324
Comments
I personally like the idea of having the datasets tagged as well. I don't necessarily want to maintain tags in two places though and have to manage to keep them in sync. Alternatively today, without any changes, you can ask for pipeline nodes that have a certain set of tags |
* Reduced text to add in decription of why Kedro exists * Changed pipeline image * Moved content to FAQ
Closing this as duplicate of #400 |
I also think this would be very useful. |
Description
It would be nice to be able to add optional tags to datasets in the catalog, and list/access them via tags, similar to what is currently implemented in
node
Context
The DataSet/Catalog system and the python interface are awesome. As the number of datasets defined in the catalog grows it would be helpful to be able to list/access specific subsets based on common tags.
Possible Implementation
I'm imagining adding an optional
tags
attribute in the catalog yml like so:The the ability to view them in the data calog from python based on the tags where
catalog.list
might have the following api:which might return
If I wanted to do something specifically with just transportation data I can easily loop through this list and load these datasources with
catalog.load
without having to explicitly list them in my python code. This helps keep the catalog as the one source of truth.The following simple (but hacky) modifications are working so far. When a
DataSet
is created there could be an instance variable_tags
and then thelist
method inDataCatalog
could look like this:And the following modification to
AbstractDataSet
is not ideal but seems to work:Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
The text was updated successfully, but these errors were encountered: