-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: Flyte System Tags and metadata #3320
Conversation
Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Ketan Umare <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very useful feature!
#### Approach 2: Certain label keys are treated special | ||
- “group” will group everything | ||
- “experiment” will also group everything with higher priority. | ||
- “name” will override the execution id with the name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users allowed to change the labels? If they are then overriding might be an issue since you might have already fired async events to external systems. So I think it might be useful to maintain executionID as an identifier that can't be modified once execution has been created.
Alternatively, this could be an alias to the execution ID rather than overriding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executionID cannot be changed - it is immutable and unique per project/domain.
name is just an alias. I will update the doc to reflect this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but i do like the idea of immutable labels as well. once added you cannot change them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll support both mutable and immutable labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Mostly sounds good to me.
Just that I would go with the name tags on the flyte (cli) level.
Querying could still be done on k8s labels as well...
A workflow or task can be executed using | ||
|
||
```bash | ||
pyflyte run --remote --labels k:v --labels k1:v1 test.py wf --input1=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably call the arg here --tag
to not confuse this with kubernetes labels.
And then probably assign the tags to k8s annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point, but the rpc field is sadly already called label. and these will become k8s labels
|
||
|
||
## 7 Potential Impact and Dependencies | ||
We this this is one of the most requested features in Flyte and will solve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We this this is one of the most requested features in Flyte and will solve | |
This is one of the most requested features in Flyte and will solve |
available on each execution. The users are allowed to filter an exection simply | ||
by clicking on a label and then all executions are filtered by that label. | ||
|
||
#### Approach 2: Certain label keys are treated special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this approach is chosen, I wonder whether it would be nicer for the user to do
pyflyte run --remote --group foo --experiment bar ...
instead of
pyflyte run --remote --labels group:foo --labels experiment:bar ...
This doesn't mean that under the hood the labels mechanism couldn't be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat opposed to this as it could be confusing to users as to what is a label vs what is a keyword cli argument 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we wanna create group
and experiment
as CLI arguments and introduce them as a concept? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer option 1: treat all labels the same way. Users might not want to follow the categories we deem sensible. Experiment tracking servers like Mlflow or Wandb, which also have such a tagging mechanism, simply allow users to assign arbitrary tags. I would argue that ML engineers are used to this and we should provide the same UX without imposing special naming conventions.
Only exception: execution name
I find it really helpful to have the pod names include customizable identifiers.
We have a registration script, similar to pyflyte run
with has an --execution_name
arg. The user provided value is appended with a random uuid, as is currently already chosen for the execution ids, and the result is checked against the execution name regex again and then passed to FlyteRemote.execute(execution_name=...)
(already supported, see here). So I wouldn't treat execution name with a pod label but the pods metadata.name
.
This comment is another argument for not treating execution names with labels but instead metadata.name
since I agree that tags need to be mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstadlbauer @fg91 @elibixby @flixr @goyalankit Some questions for you
- Do you prefer key-value pair tags or tags that only have key?
- Should we add tags to Kubernetes label?
Currently, execution spec (with labels) is serialized to byte and is stored in the execution table. it's impossible to add / delete / update tags. if we use k8s client to filtered flyteworkflow (CRD) by labels. we cannot search a execution after CR is deleted.
I have a PR that adds tags table. it allows us easily add / update / delete tags, and even attach tags to task / workflow / project. however, it's not key-value pair tags for now. If we decide to use key-value pair tags, I just need to add a new column to the tags table and update the query. I'd like to know your thought first.
btw, the current implementation works with both Mysql and Postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My $0.02 is let's keep it simple and support what you call key-only tags.
A person can 'hack' this to resemble key-value if needed (ie 'costcenter-12'), but we don't need to manage that complexity on the back end or in the UI when we get to figuring out how to let folks use tags to sort/group things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
In my opinion key-only tags are perfectly fine and what ML engineers are used to from experiment tracking servers
-
Should we add tags to Kubernetes label
I think being able to add/delete/update tags after the execution has already started or ended is an important feature. User story: an experiment is training/trained really well and I want to mark it for later. This is something that is not known when starting the execution. But updating/deleting/adding tags when the execution is already running would mean that the k8s labels are not in sync with what is stored in the tags table. I'd therefore say that I wouldn't apply the tags as labels to k8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response here but agreed with what's been said above. Key only tags would also solve all our usecases 👍
I think being able to add/delete/update tags after the execution has already started or ended is an important feature
+1 to this and the reasoning of not applying those to k8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposal very much. Currently we use decks to link to Wandb runs corresponding to Flyte executions.
We then use tags in wandb to do the grouping by experiments, tags, ... that you describe in the RFC.
Would love to do this directly in Flyte.
03-30-2023 Meeting notes: |
Slightly perpendicular to this proposal but related to the notion of making executions easier to identify: I note that Prefect uses a system of nonsense ids for executions which is much friendlier looking and far easier to remember than Flyte alphanumeric ids. For example ‘enigmatic-waxbill’ for ‘massive-antelope.’ Would something like this be possible for Flyte? |
04-13-203 notes: no updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For visibility: @kasimiraula proposed that user should be able to create notes for executions, similar to what is currently possible when aborting an execution #3646 |
I think this is possible and I have thought about it. We should consider entropy considerations and cost of doing this and then all for it 👍🏽 |
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for incorporating what was discussed in the contributors' syncs. I like that now:
- tags will not be attached to k8s objects as labels but instead saved in the database so that they can be modified/deleted during/after the execution.
- all tags are equal and Flyte doesn't impose any special names.
- we have a clear distinction between tags and notes.
Please consider all comments below as nit-picks that you can just resolve in case you don't agree.
A workflow or task can be executed using | ||
|
||
```bash | ||
pyflyte run --remote --tags '["hello", "world"]' test.py wf --input1=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do --tag hello --tag world
instead of providing a list of tags in string format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can support both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyflyte run --remote --tags '["hello", "world"]'
# and
pyflyte run --remote --tag hello --tag hello
# and
pyflyte run --remote --tag hello --tags '["key1", "key2"]'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion about this is not so strong that I'd say we need to support two options in case others prefer --tags '["hello", "world"]'
. I personally find lists or jsons in string representation as cli args a bit cumbersome.
Co-authored-by: Fabio M. Graetz, Ph.D. <[email protected]> Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Fabio M. Graetz, Ph.D. <[email protected]> Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Fabio M. Graetz, Ph.D. <[email protected]> Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Fabio M. Graetz, Ph.D. <[email protected]> Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG!
This RFC proposes a way to add execution Tags and description.
this would help the user in multple ways. Please read the rfc for more context.
Signed-off-by: Ketan Umare [email protected]