-
Notifications
You must be signed in to change notification settings - Fork 908
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we change the output of session.run
?
#1802
Comments
This is a very interesting question. I think it's right to focus just on the I think we'd need @idanov or maybe even @tsanikgr to explain exactly why The reason I only say kind of above is that it seems more questionable to me that we only return those outputs that are Also, technically it looks to me like the code that finds |
Add this related SO Question - How to run a kedro pipeline interactively like a fuction - this issues only focus on the |
Notes for Tech Design
|
Notes from Technical Design session: There was agreement that the "free outputs" output from session isn't very clear. It was suggested to simply return all output from nodes that is not consumed, even if it's defined in the catalog. However, this could lead to very large amounts of data being returned. Instead we'll change it to return all free outputs and additionally any The second point about adding an optional argument for |
Supplement on the above comments to address @AntonyMilneQB question:
The answer to that is there is a |
I just give it a go to see what would it takes to make the initial idea works, partly because I want to test how the |
Adding this as inspiration on whether we should have some kind of argument or debug mode that can specifically return output easily without editing configuration. At the moment, the proper way to inspect is
The complication is mainly due to the The question is how can we improve the user experience? It's hard to reason what is "free output" and what is not. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Background
What's the output of
session.run()
? Currently, this is not clear as you think and it isn't documented anywhere. The logic is defined inrunner.py
, this can be counter-intuitive in some cases, is there a good reason why we want to do this?kedro/kedro/runner/runner.py
Lines 78 to 91 in f491420
kedro
has improved a lot in terms of how to run the pipeline with packaging &KedroSession
as a standalone application, #1423 documents different ways to do it. Personally, I think it is still not easy enough to integrate withkedro
for someone who is inexperienced with kedro. In #1423, It mentioned how a pipeline can be called programmatically. Even though the pipeline itself is a function call, it doesn't behave like a function, i.e. you can't really define an input as an argument easily (it has to be a Catalog entry), theoutput
of the pipeline is also very restricted.Motivation
Kedro works really well within the kedro world, but it also mean that kedro works very differently from the rest of the Python world.
This issue mainly focuses on the
output
side, this will improve the experience to integrate thekedro
pipeline as an upstream. In a over-simplified world, this should be straight forward to do. Currently I think we a strong assumption that people work with "Kedro Project", but if we are moving towards a kedro package, i.e. usingfrom kedro_package import main
, it should behave just like a Python function, I think this is a reasonable expectation.Questions
session.run
?Things to consider
Related Issue:
The text was updated successfully, but these errors were encountered: