-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add meaningful representation when printing a DataCatalog #3299
Conversation
Signed-off-by: Yolan Honoré-Rougé <[email protected]>
@astrojuanlu Any thoughts of the points to discuss above ? I find it quite satisfying right now but I want to double check before creating tests and make the CI pass. |
Nice nice work! I think I like Black representation because it will mostly be viewed in a terminal with limited width |
Are we doing anything to mask credentials? |
Thanks @Galileo-Galilei! My main points are
|
thank you for taking a stab on this already! I will also be cautious to introduce |
@Galileo-Galilei Are you still interested to finish this PR? |
Nope, sorry, I close it. Very little time at the moment, and I did not come up with something totally satisfying for now so it still requires a bit of thinking. |
Description
This PR focuses on solving (very partially) #1721. I focus only on making catalog printing more meaningful, instead of the current
<kedro.io.data_catalog.DataCatalog at 0x5x231>
.The "best" representation is still not clear, and this PR aims sharing publicly trials and errors to make it more meaningful. Some requirements I'd like to meet for the "best representation" of the printed objects:
In scope : Making
print(catalog)
informativeOut of scope :
AbstractDataset.__repr__
method to generate theDataCatalog.__repr__
. However, fixing potential issues in each dataset_describe
is out of scope.Development notes
AbstractDataset.__repr__
__str__
method and fully replaced it by__repr__
. When__str__
is not user defined,__repr__
is used instead which is exactly what we want: consistency between the 2 methods._to_str
as_prettify_dict_to_str
(which looks slightly more informative, but not very nice either TBH)_to_str
method : it used to customize the string representation of dict. We now rely on the default__str__
method ofdict
. The main change is that we keep quotes around strings, which is necessary to ensure the output can be copy pasted and remain valid python code._build_str_representation
which builds the string on one line for ease of integration in the catalog. The__repr__
function only calls this methods and format it with black, which often renders it no several lines.DataCatalog.__repr__
__repr__
method which prints one line per dataset beginning by the dataset nameFor comparison, here is how it would have look like if we use black to render the string:
_describe()
method of many dataset is incorrect (but this is out of scope), see theprotocol
argument above inCSVDataset
. Is it ok to release it with incorrect representation?📝 TODO:
Developer
Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file