Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Latest task status may not always be reported #3440

Closed
2 tasks done
hamersaw opened this issue Mar 9, 2023 · 0 comments · Fixed by flyteorg/flytepropeller#540
Closed
2 tasks done

[BUG] Latest task status may not always be reported #3440

hamersaw opened this issue Mar 9, 2023 · 0 comments · Fixed by flyteorg/flytepropeller#540
Assignees
Labels
bug Something isn't working
Milestone

Comments

@hamersaw
Copy link
Contributor

hamersaw commented Mar 9, 2023

Describe the bug

Every time FlytePropeller checks a task status it constructs a pluginRequestedTransition which contains information regarding the state change of the plugin. This information is used to determine whether the current state has altered from the previously reported state which is determined using the Phase and PhaseVersion information that each plugin reports. If this has not changed (based on these two field) then FltyePropeller immediately returns and bypasses sending a TaskExecutionEvent reporting the task state to FlyteAdmin.

The problem is that plugins may report new status with the same Phase and PhaseVersion but with an updated Reason, the Reason field contains information regarding task execution. One such scenario is where a k8s Pod requests more resources than are available within the cluster. The initial TaskExecutionEvent will report a Reason "task submitted to K8s". In a subsequent evaluation the plugin updates this Reason to "Unschedulable:0/1 nodes are available: 1 Insufficient memory.", however this is never reported to FlyteAdmin because the Phase and PhaseVersion remain the same.

Expected behavior

The FlyteConsole UI should always display the latest status. This requires FlytePropeller to correctly identify an update and therefore to send an event to FlyteAdmin reporting the change.

Additional context to reproduce

Start a workflow with resource requests larger than what is available in the cluster. In the task status' pane in the UI it will always display the "RUNNING" phase with a reason of "task submitted to K8s".

@task(requests=Resources(mem="12G"))
def say_hello_resources(name: str) -> str:
    return f"hello {name}"

@workflow
def my_wf_resources(name: str) -> str:
    res = say_hello_resources(name=name)
    return res

Screenshots

image

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@hamersaw hamersaw added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers and removed untriaged This issues has not yet been looked at by the Maintainers labels Mar 9, 2023
@hamersaw hamersaw added this to the 1.6.0 milestone Mar 27, 2023
@hamersaw hamersaw self-assigned this Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant