Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate CLI inconsistencies #2014

Closed
merelcht opened this issue Nov 9, 2022 · 4 comments
Closed

Investigate CLI inconsistencies #2014

merelcht opened this issue Nov 9, 2022 · 4 comments
Assignees
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Design: Research

Comments

@merelcht
Copy link
Member

merelcht commented Nov 9, 2022

Description

Our CLI has inconsistencies in how to call flags. For example: for --node you can add the option multiple=True in click. But when calling --from-nodes this isn't possible. So for these flags you either call --node node_1 --node node_2 or --from-nodes node_1, node_2. Why not --from-nodes node_1 --from-nodes node_2? At a glance this reads a bit weird though 🤔

Bottomline, we should make the CLI more consistent and investigate what things we need to improve to get there.

Context

#1828

To do

  • Find all inconsistencies in our CLI
  • Come up with a plan on how to get rid of these inconsistencies
  • Discuss with the team
@merelcht merelcht added Issue: Feature Request New feature or improvement to existing feature Component: CLI Issue/PR that addresses the CLI for Kedro and removed Issue: Feature Request New feature or improvement to existing feature labels Nov 9, 2022
@merelcht merelcht moved this to To Do in Kedro Framework Jan 9, 2023
@SajidAlamQB SajidAlamQB moved this from To Do to In Progress in Kedro Framework Jan 9, 2023
@SajidAlamQB SajidAlamQB moved this from In Progress to To Do in Kedro Framework Jan 12, 2023
@yetudada
Copy link
Contributor

yetudada commented Jan 12, 2023

@jmholzer jmholzer self-assigned this Jan 12, 2023
@jmholzer jmholzer moved this from To Do to In Progress in Kedro Framework Jan 12, 2023
@jmholzer jmholzer removed their assignment Jan 13, 2023
@jmholzer jmholzer moved this from In Progress to To Do in Kedro Framework Jan 13, 2023
@AhdraMeraliQB AhdraMeraliQB self-assigned this Jan 16, 2023
@AhdraMeraliQB AhdraMeraliQB moved this from To Do to In Progress in Kedro Framework Jan 16, 2023
@AhdraMeraliQB AhdraMeraliQB added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Jan 16, 2023
@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Jan 19, 2023

TL;DR At the bottom

Investigation

Looking at Kedro's cli commands, only run had enough flag options to introduce inconsistencies in the implementation.

The primary inconsistency being that in the issue description: some command options allow the flag to be used several times in a command (where multiple=True), but not all. Where there is no multiple=True the last use of the flag will simply overwrite any former uses.

The command options where multiple=True are kedro run --node, kedro run --tag, and kedro run --load-version

kedro run behaviour

Exploring how the differences in implementation revealed the following points of note:

  • Options with multiple=True were still split on commas. That is to say kedro run --node node1,node2 and kedro run --node node1 --node2 would both run as expected.
  • Options with multiple=True and no defined callback function do not strip whitespaces, introducing unwanted behaviour. Running kedro run --node="node1, node2" would fail because the node ' node2' would not be found.
  • Using multiple=False for flags like --runner and --async did not prevent me from or complain about running commands with multiple flags, ergo the argument does not provide any validation.

History

The divergence in implementation traces back to the implementation of the --from-nodes and --to-nodes flags. Prior to these flags all options were implemented with either multiple=True or multiple=False. This convention was broken for the new flags as it was decided the command kedro run --from-nodes node1 --from-nodes node2 was too verbose.

It is worth noting that the investigation showed multiple=True does not force users to use multiple flags. The command will still work as expected if values are provided when comma-separated, e.g. kedro run --from-nodes node1,node2.

During this investigation it was also noted that the commands we recommend in our resources are not consistent in formatting. In docs we suggest kedro run --flag value1,value2, whereas in the viz metadata panel we suggest kedro run --flag="value1,value2".

UX Concerns

@amandakys also noted the following concerns:

  1. Is it worth separating out the --to and --from options?
    This could increase the extensibility of to/from keywords , allowing --from --anything and --to --anything. An example being kedro run --from --node node1,node2. This would also enable The reuse of the --node option, making the CLI commands more modular. It was noted that we were unsure whether this was possible using click.

  2. How can we increase the accessibility of the param names that we supply to these commands? e.g. adding a kedro list --nodes
    At the moment we have kedro registry describe <pipeline-name> to list nodes in a pipeline.
    Is there value in something like kedro list --node

  3. Do we need to use plural words to indicate the possibility of multiple inputs? Or can this be assumed?
    A convention that could be applied is: _Options that support multiple comma delimited values should be plural and not require multiple=True

Example commands using this principle:

  • kedro run --pipeline <pipeline-name> (does not support multiple pipeline values)
  • kedro run --nodes <node-name>,<node-name>(does support multiple node values)
    By this logic, the commands --to-nodes and --from-nodes should remain plural if they support comma delimited values.
  • kedro run --to-nodes <node-name>,<node-name> (supports comma delimiting)
  • kedro run --to-node <node-name> --to-node <node-name>(does not support)
  1. Can multiple=True be replaced with support for multiple param inputs (which i think already exists?)

Following the above mentioned convention about plurality and number of inputs, there is no reason why a plural option cannot also support multiple=True, but this would provide duplicate functionality. Imo multiple=True and comma delimiting are mutually exclusive options.

  • kedro run --to-nodes <node-name>, <node-name> --to-nodes <node-name>

Tech Design 18/01/23 and Further Discussions

In Tech design we addressed several questions:

1. How do we resolve the inconsistency regarding multiple=True?

The value addition of implementing it is not much, as it allows for flags to be given multiple times but seeing that it makes commands more verbose, it is unlikely to be preferred in practice. However, @deepyaman noted that it aligns with convention to continue to support multiple flags.

@AntonyMilneQB and @idanov both noted that there isn't necessarily an inconsistency - multiple=True is implemented for command options in singular form, whereas those in plural form imply several values would be supplied to the one flag, and as such, do not allow for multiple flags. This discrepancy between using singular and plural form was also noted by @amandakys in question three above.

The consensus (as of tech design) was to use singular commands across the board by deprecating the plural forms in 0.19.0 and introducing singular equivalents that behave the same but also make use of multiple=True. These new singular commands should continue to split on commas as well.

Later, however, @idanov noted that changing these commands to singular form would then introduce inconsistencies with the api calls. See: https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/slice_a_pipeline.html .

Furthermore, @merelcht flagged that using multiple=True in for only some flags leads to inconsistencies in the syntax used to supply values in a config file. See: #1792 and https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#configure-kedro-run-arguments

To pass values to --node and --tag one must use a list. However, using a list for --from-nodes does not work, and instead one must use comma separated values. This would be very confusing for users without an understanding of the cli commands.

Further discussion is necessary before a final decision can be made.

2. What format for our runs should we be using in our resources?

As noted above, we do not have a consistent style for our command suggestions. The consensus was to follow convention and use kedro run --flag=value1,value2.

3. Should we strip whitespaces from values passed to --node and --tag?

The investigation found that when using the format kedro run --flag="value1, value2" would fail for --node and --tag but not for the options implemented without multiple=True. The reason is likely because the callback functions handle the spaces, whereas the default implementation does not.

As node names cannot have spaces, it was decided that this behaviour should be corrected. However, our current implementation for tags allows spaces. We could make assumptions and trim a singular space character after commas but I believe this gives raise to a larger question: Should we be allowing users to creates tags with spaces?. This will need to be addressed in further discussions.

The TL;DR

Ultimately, the issue of the command inconsistencies impacts more than first suspected.

In tech design we suggested replacing existing plural commands (e.g --from-nodes) with singular commands that allow for multiple flags. However, we cannot introduce singular commands without introducing inconsistencies with the api calls.
Not being consistent with where we have multiple=True leads to confusion in supplying values in a config file. A resolution here would likely mean either all flags or no flags allow multiple=True, with no in between.
These points will be discussed further.

Some other questions to discuss include:

  1. Should we allow tags to have spaces?
  2. Specifying multiple=False adds no value and was inherited as convention. Should we remove it from --runner and --async, and if not, should it be added to the other flags (--pipeline, --config, --conf-souce) that do not allow multiple flags?

We did, however, come to a consensus on command formatting in our resources 🥳.

@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Jan 23, 2023

A special thanks to @amandakys, @AntonyMilneQB, @idanov and @merelcht for their contributions to this investigation!

Following a discussion on 23/01/2023, we have agreed on the following changes to be made:

  1. We will deprecate the flags --node, --tag and --load-version in 0.19.0. These will be replaced with the flags --nodes, --tags, and --load-versions. These new, pluralised flags will not allow the use of multiple=True.
  1. The shorter forms -n, -t, and -lv will continue to behave as they currently do, and will switch over to represent the new flags only once the singular flags are deprecated (in 0.19.0). This change is included as part of #2245.

  2. The docs will be changed to recommend the plural flags instead of the singular flags. Note that this change will include changing the suggestion for config.yml as the list syntax will no longer work for specifying nodes and tags (see #1792)

  1. Defining multiple=False is redundant. We will remove this argument from the flags --runner and --async.
  1. We will add validation to user-defined tags in 0.19.0 so that tags are subject to the same constraints as node names.
  1. There are still avenues to consider regarding the consistencies with the pipeline api; a new investigation examining those is warranted.

@AhdraMeraliQB
Copy link
Contributor

Closed in favour of follow-on actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Design: Research
Projects
Archived in project
Development

No branches or pull requests

5 participants