-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-public interface usage in EcsRunTaskOperator #29447
Conversation
Not sure is it changes, breeze itself or something else but I have interesting behaviour with logs during execution this operator [2023-02-09, 23:33:53 UTC] {ecs.py:489} INFO - ECS task ID is: 6beb6333b8dc4cf7afbd36b524efd026*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f2c2590> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
*** * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f3e0050> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
*** * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f480810> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files:
*** * /root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
*** Could not read served logs: Parent instance <TaskInstance at 0xffff5f259310> is not bound to a Session; lazy load operation of attribute 'trigger' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
*** Found local files: Update: Fixed by #29472 |
2e1e712
to
17bca6d
Compare
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache#29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here #29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context.
6b5793e
to
d196dc7
Compare
@o-nikolas - looks like this one needs review, otherwise it is ready to be merged cc: @Taragolis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I did raise a comment previously, but it was just a convention nitpick, don't block on my account. 👍
ti: TaskInstance = context["ti"] | ||
self.started_by = generate_uuid(*map(str, ti.key.primary)) | ||
if self.do_xcom_push: | ||
ti.xcom_push("started_by", self.started_by) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why you're pushing this value to xcom. I don't see where it's read back? In _try_reattach_task
it is gotten from self.started_by
not xcom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All XComs values associated to the task cleared before it started. For current implementation (in main
) EcsRunTaskOperator use non-documented feature (non-public interface usage) by save specific value to XCom for not existed task, and which do not cleared when you reset task run. And potentially it also could damage user Airflow Database (e.g. constraints violations)
This value only for the reference for example if user want to find this task in ECS (e.g. in AWS Console / AWS SDK) by provide startedBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to just allow the user to reference the statedby uuid, then I think logging it and/or using an Operator extralink is much more user friendly that putting it in XCOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is possible to create extra link, I'm not sure that is possible to create GET request for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair enough. If you can just log it then that'd be helpful for users (in addition to the XCOM, no need to remove that), otherwise lgtm 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decide to remove ability to save started_by
into the xcom, started_by
is deterministic and only required for reattach, I'm not sure there are any benefits to keep it in XCom
list_tasks_resp = self.client.list_tasks( | ||
cluster=self.cluster, desiredStatus="RUNNING", family=ecs_task_family | ||
cluster=self.cluster, desiredStatus="RUNNING", startedBy=self.started_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to see if there is a value for started_by in xcom here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't get XCom from previous task run, because it removed by Airflow before spawn new task run.
In additional I've just make started_by
as private attribute
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Hey @Taragolis Curious if you are going to merge this change or if it is legit stale? We are having the same issue with reattachement. |
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context. GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
+1 I'm experiencing this issue as well |
Thanks for reminder, I've just tried to find this PR for continue work on in (simply, starting with rebase) |
Co-authored-by: D. Ferruzzi <[email protected]>
d196dc7
to
5079617
Compare
@ferruzzi @o-nikolas @vincbeck when you have a time could you have a look? |
need review against current codebase
Awesome feature, thank you so much for shipping it out! How would we be able to access these changes? Will it be available directly with the ECS run task operator, or will it be part of a larger release of airflow? |
Look at the docs. we release providers separately from core and if you want you can upgrade them independetntly. https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#installation-and-upgrade-scenarios |
We are gearing up to both releasing provider wave (release candidates likely today) and new airlfow 2.7.1 (next weeks) - once providers are out latest airflow release by default points to the latest released providers (but you can freely downgrade/upgrade the providers independently as you wish) |
if self.reattach: | ||
# Save the task ARN in XCom to be able to reattach it if needed | ||
self.xcom_push(context, key=self.REATTACH_XCOM_KEY, value=self.arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a somewhat-breaking change, as the example code
airflow/tests/system/providers/amazon/aws/example_ecs_fargate.py
Lines 125 to 126 in e0f21f4
# You must set `reattach=True` in order to get ecs_task_arn if you plan to use a Sensor. | |
reattach=True, |
was recommending setting reattach to true to get the ARN.
I think this sucked, but removing the arn entirely from the xcom values is not good either.
What we could do is set it all the time now that we don't rely on this anymore to know if we need to reattach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this part was never work as it expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, pushing the ARN to xcom at least was working.
I opened a PR to restore that specific thing.
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context. GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context. GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
Probably we should just chop this view in favor of grid view logging which is the future. But this fixes rendering issues raised here apache/airflow#29447 (comment). What we do, is in log tailing context (which apparently isn't used in grid, and that's why I did not see this in developing trigger logging) we don't add the messages to the log content. So, whenever log_pos is in metadata we don't add messages. It means the messages could be a bit stale but that seems ok. Refreshing the page could fix that. Longer term, we could update the API so that log content is just content and the messages are themselves returned in the metadata dict. That's probably the "right" solution ultimately. But can be saved for another day. Also resolve the "cannot load lazy instance" issue when invoking the reader logic from this different context. GitOrigin-RevId: 7cf5cea76e3ff8b790d7185632b6dd7b0196f0e3
Right now
EcsRunTaskOperator
when required "reattach" use hacks:By this PR current mechanism replaced by builtin ECS.Client.run_task ability to setup startedBy and filter it later.
startedBy
limited by 36 characters an it is not possible to use string representation ofdag_id
+task_id
+run_id
+map_id
, instead of this generate UUID based on this value which can be used as unique (per TI) value.Unfortunetly right now
EcsRunTaskOperator
setstartedBy
by owner of task, so it become mutuality exclusive:startedBy
set as unique TI (UUID)startedBy
set as task.owner