Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-public interface usage in EcsRunTaskOperator #29447

Merged
merged 2 commits into from
Aug 24, 2023

Conversation

Taragolis
Copy link
Contributor

Right now EcsRunTaskOperator when required "reattach" use hacks:

  • save info to XCom which not valid because it not use a public interface
  • In additional it can't work with Dynamic Task Mappings (all xcom name contain only task_id info and miss other part of unique TI key)

By this PR current mechanism replaced by builtin ECS.Client.run_task ability to setup startedBy and filter it later.

startedBy limited by 36 characters an it is not possible to use string representation of dag_id + task_id + run_id + map_id, instead of this generate UUID based on this value which can be used as unique (per TI) value.

Unfortunetly right now EcsRunTaskOperator set startedBy by owner of task, so it become mutuality exclusive:

  1. If reattach set to True than startedBy set as unique TI (UUID)
  2. If reattach set to False (default) than startedBy set as task.owner

@Taragolis Taragolis requested a review from ferruzzi February 9, 2023 19:31
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Feb 9, 2023
@Taragolis
Copy link
Contributor Author

Taragolis commented Feb 9, 2023

Not sure is it changes, breeze itself or something else but I have interesting behaviour with logs during execution this operator

[2023-02-09, 23:33:53 UTC] {ecs.py:489} INFO - ECS task ID is: 6beb6333b8dc4cf7afbd36b524efd026*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f2c2590> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
***   * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f3e0050> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
***   * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f480810> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
***   * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f259310> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:

Update: Fixed by #29472

@Taragolis Taragolis force-pushed the ecs-operator-replace-xcom-hack branch 2 times, most recently from 2e1e712 to 17bca6d Compare February 11, 2023 09:02
dstandish added a commit to astronomer/airflow that referenced this pull request Feb 12, 2023
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache#29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.
dstandish added a commit that referenced this pull request Feb 17, 2023
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here #29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.
@Taragolis Taragolis linked an issue Feb 18, 2023 that may be closed by this pull request
2 tasks
@Taragolis Taragolis marked this pull request as ready for review February 27, 2023 09:24
@Taragolis Taragolis force-pushed the ecs-operator-replace-xcom-hack branch 3 times, most recently from 6b5793e to d196dc7 Compare March 2, 2023 09:32
@shubham22
Copy link

@o-nikolas - looks like this one needs review, otherwise it is ready to be merged cc: @Taragolis

ferruzzi
ferruzzi previously approved these changes Mar 21, 2023
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did raise a comment previously, but it was just a convention nitpick, don't block on my account. 👍

ti: TaskInstance = context["ti"]
self.started_by = generate_uuid(*map(str, ti.key.primary))
if self.do_xcom_push:
ti.xcom_push("started_by", self.started_by)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why you're pushing this value to xcom. I don't see where it's read back? In _try_reattach_task it is gotten from self.started_by not xcom.

Copy link
Contributor Author

@Taragolis Taragolis Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All XComs values associated to the task cleared before it started. For current implementation (in main) EcsRunTaskOperator use non-documented feature (non-public interface usage) by save specific value to XCom for not existed task, and which do not cleared when you reset task run. And potentially it also could damage user Airflow Database (e.g. constraints violations)

This value only for the reference for example if user want to find this task in ECS (e.g. in AWS Console / AWS SDK) by provide startedBy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to just allow the user to reference the statedby uuid, then I think logging it and/or using an Operator extralink is much more user friendly that putting it in XCOM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is possible to create extra link, I'm not sure that is possible to create GET request for this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair enough. If you can just log it then that'd be helpful for users (in addition to the XCOM, no need to remove that), otherwise lgtm 👍

Copy link
Contributor Author

@Taragolis Taragolis Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decide to remove ability to save started_by into the xcom, started_by is deterministic and only required for reattach, I'm not sure there are any benefits to keep it in XCom

list_tasks_resp = self.client.list_tasks(
cluster=self.cluster, desiredStatus="RUNNING", family=ecs_task_family
cluster=self.cluster, desiredStatus="RUNNING", startedBy=self.started_by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to see if there is a value for started_by in xcom here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't get XCom from previous task run, because it removed by Airflow before spawn new task run.
In additional I've just make started_by as private attribute

@github-actions
Copy link

github-actions bot commented May 9, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 9, 2023
@github-actions github-actions bot closed this May 14, 2023
@scottypate
Copy link
Contributor

Hey @Taragolis Curious if you are going to merge this change or if it is legit stale? We are having the same issue with reattachement.

ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 12, 2023
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.

GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
@SBurwash
Copy link

+1 I'm experiencing this issue as well

@Taragolis
Copy link
Contributor Author

Thanks for reminder, I've just tried to find this PR for continue work on in (simply, starting with rebase)

@Taragolis Taragolis reopened this Aug 20, 2023
@Taragolis Taragolis force-pushed the ecs-operator-replace-xcom-hack branch from d196dc7 to 5079617 Compare August 20, 2023 19:57
@Taragolis
Copy link
Contributor Author

@ferruzzi @o-nikolas @vincbeck when you have a time could you have a look?

@Taragolis Taragolis removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 20, 2023
@Taragolis Taragolis dismissed ferruzzi’s stale review August 21, 2023 16:28

need review against current codebase

@potiuk potiuk merged commit 2d86252 into apache:main Aug 24, 2023
@SBurwash
Copy link

Awesome feature, thank you so much for shipping it out!

How would we be able to access these changes? Will it be available directly with the ECS run task operator, or will it be part of a larger release of airflow?

@potiuk
Copy link
Member

potiuk commented Aug 24, 2023

Look at the docs. we release providers separately from core and if you want you can upgrade them independetntly. https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#installation-and-upgrade-scenarios

@potiuk
Copy link
Member

potiuk commented Aug 24, 2023

We are gearing up to both releasing provider wave (release candidates likely today) and new airlfow 2.7.1 (next weeks) - once providers are out latest airflow release by default points to the latest released providers (but you can freely downgrade/upgrade the providers independently as you wish)

Comment on lines -653 to -655
if self.reattach:
# Save the task ARN in XCom to be able to reattach it if needed
self.xcom_push(context, key=self.REATTACH_XCOM_KEY, value=self.arn)
Copy link
Contributor

@vandonr-amz vandonr-amz Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a somewhat-breaking change, as the example code

# You must set `reattach=True` in order to get ecs_task_arn if you plan to use a Sensor.
reattach=True,

was recommending setting reattach to true to get the ARN.

I think this sucked, but removing the arn entirely from the xcom values is not good either.
What we could do is set it all the time now that we don't rely on this anymore to know if we need to reattach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this part was never work as it expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, pushing the ARN to xcom at least was working.
I opened a PR to restore that specific thing.

ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 7, 2023
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.

GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 19, 2024
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.

GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 8, 2024
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment).

What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content.  So, whenever log_pos is in metadata we don't add messages.  It means the messages could be a bit stale but that seems ok.  Refreshing the page could fix that.  Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict.  That's probably the "right" solution ultimately.  But can be saved for another day.  Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.

GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EcsRunTaskOperator reattach does not work
9 participants