Clean up catalog.datasets
and catalog._data_sets
#2999
Labels
Issue: Feature Request
New feature or improvement to existing feature
Milestone
Description
In #2998, an example is added to access
metadata
in data catalog. This is useful when user need to extend Kedro, i.e.kedro-viz
or plugin developers. We provide an example as followIt used an internal
__dict__
which is not ideal. On the other hand, we also want to improve the usability of the class in standalone or interactive modeContext
There are 3 ways to access dataset.
catalog.datasets
catalog._data_sets
catalog._get_dataset(name)
They are used inconsistently in the codebase and all behavior slightly differently.
catalog.datasets
is the only "public" one which allow user to interact with the "read-only"FrozenDataset
.However, there is caveat to use
catalog.datasets
. For example, if you have transcoding dataset, it get converted to some internal non-readable string.With this hook, if you have a dataset name
X_train@spark
, thedataset_name
will beX_train__spark
which is unexpected when you try to access the dictionary. On the other hand,catalog.datasets
is not subscriptable, so you cannot access it by index or name.This is due to the addition of the
catalog.dataset_name
syntax, you cannot have a Python property with @ or # in it, and it should be a valid Python name (mostly designed for interactive mode). The conversion happens here:kedro/kedro/io/data_catalog.py
Lines 89 to 99 in 0293dc1
All of these lead to a conclusion that we should try to simply the interface (both public and internal). DataCatalog and it attributes are non-readable, it make it really hard to work in interactive mode because you cannot see what's inside the attribute without looking at the source code.
The main challenge here is to make things consistent without breaking change (or if we need breaking change we should do it now before 0.19). We should try to keep
catalog.datasets
since this it is the public interface people relies on. Things that I am not sure about is that also tightly couple tokedro-viz
? We should check when this is implemented.Possible Implementation
We can improve the usablility with auto-completion and using
dataclasses
or make these object inherit from things likeUserDict
(We did similar thing forpipelines
).Possible Alternatives
The text was updated successfully, but these errors were encountered: