Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate log lines in CloudWatch after upgrade to 2.4.2 #27345

Closed
1 of 2 tasks
JorgenG opened this issue Oct 28, 2022 · 16 comments · Fixed by #27591
Closed
1 of 2 tasks

Duplicate log lines in CloudWatch after upgrade to 2.4.2 #27345

JorgenG opened this issue Oct 28, 2022 · 16 comments · Fixed by #27591
Labels
area:core kind:bug This is a clearly a bug
Milestone

Comments

@JorgenG
Copy link

JorgenG commented Oct 28, 2022

Apache Airflow version

2.4.2

What happened

We upgraded airflow from 2.4.1 to 2.4.2 and immediately notice that every task log line is duplicated into CloudWatch. Comparing logs from tasks run before upgrade and after upgrade indicates that the issue is not in how the logs are displayed in Airflow, but rather that it now produces two log lines instead of one.

When observing both the CloudWatch log streams and the Airflow UI, we can see duplicate log lines for all most log entries post upgrade, whilst seeing single log lines in tasks before upgrade.

This happens both for tasks ran in a remote EcsRunTaskOperator's as well as in regular PythonOperator's.

What you think should happen instead

A single non-duplicate log line should be produced into CloudWatch.

How to reproduce

From my understanding now, any setup on 2.4.2 that uses CloudWatch remote logging will produce duplicate log lines. (But I have not been able to confirm other setups)

Operating System

Docker: apache/airflow:2.4.2-python3.9 - Running on AWS ECS Fargate

Versions of Apache Airflow Providers

apache-airflow[celery,postgres,apache.hive,jdbc,mysql,ssh,amazon,google,google_auth]==2.4.2
apache-airflow-providers-amazon==6.0.0

Deployment

Other Docker-based deployment

Deployment details

We are running a docker inside Fargate ECS on AWS.

The following environment variables + config in CloudFormation control remote logging:

            - Name: AIRFLOW__LOGGING__REMOTE_LOGGING
              Value: True
            - Name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
              Value: !Sub "cloudwatch://${TasksLogGroup.Arn}"

Anything else

We did not change any other configuration during the upgrade, simply bumped the requirements for provider list + docker image from 2.4.1 to 2.4.2.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@JorgenG JorgenG added area:core kind:bug This is a clearly a bug labels Oct 28, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 28, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk potiuk added this to the Airflow 2.4.3 milestone Oct 28, 2022
@potiuk
Copy link
Member

potiuk commented Oct 28, 2022

Are you sure ALL messages are duplicated?

There was a change to reveal messages produced during parsing which were hidden, and this is something that coudl cause more logs - but that does not mean every log is repeated - #26779

If my quess is right - you will get duplicated logs from parsing but not from execution of the tasks and if you help to verify that this is the case - the silution is for you to sort out the messages according to where they are coming from (because they might be relevant and possibly different on execution and different on parsing (even if some of them are repeated). Also it might mean that you are doing "too much" during parsing at the top-level code of your DAG.

Can you post examples of such duplicated lines and see if the actual execuition is NOT duplicated ?|

@JorgenG
Copy link
Author

JorgenG commented Oct 28, 2022

Hey @potiuk, thanks a lot for responding. I realise my vocabulary for Airflow logs might be a bit inaccurate.

I'll try to be more specific about what I observe. First of all, I am not 100% what you mean about about parsing of top-level code in the DAG.

Essentially what we observe is that log lines inside a single task execution appears to be duplicated.

Below is an excerpt from a task that spins up an ECS container.

*** Reading remote log from Cloudwatch log_group: /stacks/intranet/airflow-tasks log_stream: dag_id=dbt-events-incremental/run_id=scheduled__2022-10-28T15_15_00+00_00/task_id=dbt_events_incremental/attempt=1.log.
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance: dbt-events-incremental.dbt_events_incremental scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance: dbt-events-incremental.dbt_events_incremental scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance: dbt-events-incremental.dbt_events_incremental scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance: dbt-events-incremental.dbt_events_incremental scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] 
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] 
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] Starting attempt 1 of 1
[2022-10-28, 15:21:36 UTC] Starting attempt 1 of 1
[2022-10-28, 15:21:36 UTC] 
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] 
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] Executing <Task(DbtOperatorV2): dbt_events_incremental> on 2022-10-28 15:15:00+00:00
[2022-10-28, 15:21:36 UTC] Executing <Task(DbtOperatorV2): dbt_events_incremental> on 2022-10-28 15:15:00+00:00
...
[2022-10-28, 15:21:37 UTC] Starting ECS Task Log Fetcher
[2022-10-28, 15:21:37 UTC] Starting ECS Task Log Fetcher
[2022-10-28, 15:21:37 UTC] /home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/ecs.py:437: DeprecationWarning: This class is deprecated. Please use `airflow.providers.amazon.aws.hooks.ecs.EcsTaskLogFetcher`.
  self.task_log_fetcher = self._get_task_log_fetcher()

[2022-10-28, 15:22:07 UTC] Using connection ID 'aws_default' for task execution.
[2022-10-28, 15:22:38 UTC] [2022-10-28, 15:22:15 UTC] + set -o pipefail
[2022-10-28, 15:22:38 UTC] [2022-10-28, 15:22:15 UTC] + set -o pipefail
...
[2022-10-28, 15:54:54 UTC] ECS Task has been successfully executed
[2022-10-28, 15:54:54 UTC] ECS Task has been successfully executed
[2022-10-28, 15:54:54 UTC] Marking task as SUCCESS. dag_id=dbt-events-incremental, task_id=dbt_events_incremental, execution_date=20221028T151500, start_date=20221028T152136, end_date=20221028T155454
[2022-10-28, 15:54:54 UTC] Marking task as SUCCESS. dag_id=dbt-events-incremental, task_id=dbt_events_incremental, execution_date=20221028T151500, start_date=20221028T152136, end_date=20221028T155454
[2022-10-28, 15:54:54 UTC] Task exited with return code 0
[2022-10-28, 15:54:54 UTC] 0 downstream tasks scheduled from follow-on schedule check

Upon looking through this in detail, I actually find, as you say, that there are lines that are not duplicated.

I.e.

[2022-10-28, 15:21:37 UTC] /home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/ecs.py:437: DeprecationWarning: This class is deprecated. Please use `airflow.providers.amazon.aws.hooks.ecs.EcsTaskLogFetcher`.
  self.task_log_fetcher = self._get_task_log_fetcher()
[2022-10-28, 15:22:07 UTC] Using connection ID 'aws_default' for task execution.
...
[2022-10-28, 15:54:54 UTC] Task exited with return code 0
[2022-10-28, 15:54:54 UTC] 0 downstream tasks scheduled from follow-on schedule check

So based on your comment, I assume these are probably lines from DAG parsing then?

I realise now I was wrong stating that all log lines were duplicated, I didn't pay enough attention when looking at it. (Thanks for the hint 🙂 )

To also further elaborate, I don't see duplication of logs in i.e. the webserver or scheduled cloudwatch outputs.

This is in no way critical for us, but I would be very happy to help out further to get it identified / fixed / tested.

  • Is there any way we can override some configuration or similar to verify that the change you referenced might be the cause? I.e. override propagation or similar?
  • Can there be any weird configuration / settings we have enabled which might impact this? (I would have assumed others to notice the same problem during testing?)

Best, Jørgen

@potiuk
Copy link
Member

potiuk commented Oct 28, 2022

So based on your comment, I assume these are probably lines from DAG parsing then?

Not really - there are lines from execution of tasks. "Top-level" Logs from parsing are generated when the file is picked up by airflow - for example if you have import error, or print directly at the top level of Python code:

print('x') # <- that's top level code

with DAG(...) as dag:
 ...       

So my wild guess was wrong but possibly not that much. I think the change might come frrom this change #26779

I believe that your duplicate logs come from logs coming from two sources:

  1. Airflow Cloud Watch logger
  2. Fargate instance

When you have REMOTE_LOGGING turned on and cloudwatch configured, airflow will use CloudWatch to log entries:
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/cloud-watch-task-handlers.html

However I think your fargate container might also send logs to cloudwatch from sdout of airflow.

One of the things that changed in #26779 was setting of task logger to propagate logs to higher-level logger:

        'airflow.task': {
            'handlers': ['task'],
            'level': LOG_LEVEL,
            # Set to true here (and reset via set_context) so that if no file is configured we still get logs!
            'propagate': True,   ##### <- This was previously False
            'filters': ['mask_secrets'],
        },

Which sounds quite plausible explanation why it changed when you migrated.

Do you have any custom logging configured? like that : https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration ? Can you explain what you have if you do ?

If no, can you follow the instruction and overwirit "airflow.task" in the dictionary and set False instead of True and see if it helps ?

@andreyvital
Copy link

@potiuk I've made the changes as pointed by you and it worked! 🥳

config/log_config.py

from copy import deepcopy

from airflow.config_templates.airflow_local_settings import \
    DEFAULT_LOGGING_CONFIG

LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
LOGGING_CONFIG["loggers"]["airflow.task"]["propagate"] = False

airflow.cfg:

[logging]
logging_config_class = log_config.LOGGING_CONFIG

Thanks!

@potiuk
Copy link
Member

potiuk commented Oct 28, 2022

@ashb -> FYI this is quite an unexpected side effect of #26779 . I expect more people having similar problems. But I am not sure if we can do anything about (however maybe it's worth to rethink "propageate" on airflow.task. Or at least make it depends on "REMOTE_LOGGING" setting.

WDYT?

@JorgenG
Copy link
Author

JorgenG commented Oct 28, 2022

Thanks a lot for taking the effort to research and advice here @potiuk - Seems you nailed it! 🔨

Also thanks @andreyvital for chiming in with the example!

Can confirm that with the same configuration that @andreyvital shared, we are no longer seeing duplicate logs! Is it worthwhile keeping this issue open, or would you want to close it? LMK! (Or do any action you see fit yourself of course 🙂)

Have a nice weekend 🌮 !

@potiuk
Copy link
Member

potiuk commented Oct 28, 2022

LEt's see what @ashb has to say :) . I am not sure I know the exact reason why propagate was set to True for "airflow.tasks" (I know why it was for "airflow.processor" )

feluelle added a commit to astronomer/astro-sdk that referenced this issue Oct 31, 2022
# Description

## What is the current behavior?

In Airflow version `2.4.2` the airflow logs propagation has changed from
`False` to `True`.

closes: #1149
related: apache/airflow#27345

## What is the new behavior?

Since we don't want to show any airflow logs to the user by default we
explicitly disable it now.

## Does this introduce a breaking change?

No.

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
@zachliu
Copy link
Contributor

zachliu commented Nov 4, 2022

just noticed this issue meself when inspecting logs. i have similar setups: using the EcsRunTaskOperator to run Fargate tasks on ECS. can confirm the workaround @andreyvital works!

@ephraimbuddy ephraimbuddy modified the milestones: Airflow 2.4.3, Airflow 2.4.4 Nov 9, 2022
@ashb
Copy link
Member

ashb commented Nov 10, 2022

Is this only a problem on cloudwatch logs? If no other logging backends are suffering from this then my gut answer says this is a problem with that one specific logging setup, not something in core. Let me take a look.

@ashb
Copy link
Member

ashb commented Nov 10, 2022

Was the cause upgrading to 2.4.2, or is it possible this is related to the 6.0.0 release of Amazon provider?

Is someone able to try with apache-airflow-providers-amazon<6 (and without the propagate change) and see if that still gets double logging?

@ashb
Copy link
Member

ashb commented Nov 10, 2022

And as for what airflow.tasks was set to propagate: So that parse-time warnings from inside baseoperator.py don't vanish in to the ether: #26599 (comment)

@JorgenG
Copy link
Author

JorgenG commented Nov 10, 2022

Was the cause upgrading to 2.4.2, or is it possible this is related to the 6.0.0 release of Amazon provider?

We were on 2.4.1 w/ 6.0.0 and noticed it when upgrading to 2.4.2. (So I'd say it's unlikely)

@ashb
Copy link
Member

ashb commented Nov 10, 2022

Found the issue, it's a "bug"/bad subclassing in Cloudwatch's set_context function.

Actually in almost all subclasses. Hmmm I wonder if I can rewrite that to not need the return value...

@ashb
Copy link
Member

ashb commented Nov 10, 2022

Fix incoming, and we're preparing the RC for 2.4.3 this week, so will be available soon

@ashb
Copy link
Member

ashb commented Nov 10, 2022

Fix #27591

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants