-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate log lines in CloudWatch after upgrade to 2.4.2 #27345
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Are you sure ALL messages are duplicated? There was a change to reveal messages produced during parsing which were hidden, and this is something that coudl cause more logs - but that does not mean every log is repeated - #26779 If my quess is right - you will get duplicated logs from parsing but not from execution of the tasks and if you help to verify that this is the case - the silution is for you to sort out the messages according to where they are coming from (because they might be relevant and possibly different on execution and different on parsing (even if some of them are repeated). Also it might mean that you are doing "too much" during parsing at the top-level code of your DAG. Can you post examples of such duplicated lines and see if the actual execuition is NOT duplicated ?| |
Hey @potiuk, thanks a lot for responding. I realise my vocabulary for Airflow logs might be a bit inaccurate. I'll try to be more specific about what I observe. First of all, I am not 100% what you mean about about parsing of top-level code in the DAG. Essentially what we observe is that log lines inside a single task execution appears to be duplicated. Below is an excerpt from a task that spins up an ECS container.
Upon looking through this in detail, I actually find, as you say, that there are lines that are not duplicated. I.e.
So based on your comment, I assume these are probably lines from DAG parsing then? I realise now I was wrong stating that all log lines were duplicated, I didn't pay enough attention when looking at it. (Thanks for the hint 🙂 ) To also further elaborate, I don't see duplication of logs in i.e. the webserver or scheduled cloudwatch outputs. This is in no way critical for us, but I would be very happy to help out further to get it identified / fixed / tested.
Best, Jørgen |
Not really - there are lines from execution of tasks. "Top-level" Logs from parsing are generated when the file is picked up by airflow - for example if you have import error, or print directly at the top level of Python code:
So my wild guess was wrong but possibly not that much. I think the change might come frrom this change #26779 I believe that your duplicate logs come from logs coming from two sources:
When you have REMOTE_LOGGING turned on and cloudwatch configured, airflow will use CloudWatch to log entries: However I think your fargate container might also send logs to cloudwatch from sdout of airflow. One of the things that changed in #26779 was setting of task logger to propagate logs to higher-level logger:
Which sounds quite plausible explanation why it changed when you migrated. Do you have any custom logging configured? like that : https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration ? Can you explain what you have if you do ? If no, can you follow the instruction and overwirit "airflow.task" in the dictionary and set False instead of True and see if it helps ? |
@potiuk I've made the changes as pointed by you and it worked! 🥳 config/log_config.py from copy import deepcopy
from airflow.config_templates.airflow_local_settings import \
DEFAULT_LOGGING_CONFIG
LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
LOGGING_CONFIG["loggers"]["airflow.task"]["propagate"] = False airflow.cfg:
Thanks! |
Thanks a lot for taking the effort to research and advice here @potiuk - Seems you nailed it! 🔨 Also thanks @andreyvital for chiming in with the example! Can confirm that with the same configuration that @andreyvital shared, we are no longer seeing duplicate logs! Is it worthwhile keeping this issue open, or would you want to close it? LMK! (Or do any action you see fit yourself of course 🙂) Have a nice weekend 🌮 ! |
LEt's see what @ashb has to say :) . I am not sure I know the exact reason why propagate was set to True for "airflow.tasks" (I know why it was for "airflow.processor" ) |
# Description ## What is the current behavior? In Airflow version `2.4.2` the airflow logs propagation has changed from `False` to `True`. closes: #1149 related: apache/airflow#27345 ## What is the new behavior? Since we don't want to show any airflow logs to the user by default we explicitly disable it now. ## Does this introduce a breaking change? No. ### Checklist - [ ] Created tests which fail without the change (if possible) - [ ] Extended the README / documentation, if necessary
just noticed this issue meself when inspecting logs. i have similar setups: using the EcsRunTaskOperator to run Fargate tasks on ECS. can confirm the workaround @andreyvital works! |
Is this only a problem on cloudwatch logs? If no other logging backends are suffering from this then my gut answer says this is a problem with that one specific logging setup, not something in core. Let me take a look. |
Was the cause upgrading to 2.4.2, or is it possible this is related to the 6.0.0 release of Amazon provider? Is someone able to try with |
And as for what |
We were on 2.4.1 w/ 6.0.0 and noticed it when upgrading to 2.4.2. (So I'd say it's unlikely) |
Found the issue, it's a "bug"/bad subclassing in Cloudwatch's Actually in almost all subclasses. Hmmm I wonder if I can rewrite that to not need the return value... |
Fix incoming, and we're preparing the RC for 2.4.3 this week, so will be available soon |
Fix #27591 |
Apache Airflow version
2.4.2
What happened
We upgraded airflow from 2.4.1 to 2.4.2 and immediately notice that every task log line is duplicated into CloudWatch. Comparing logs from tasks run before upgrade and after upgrade indicates that the issue is not in how the logs are displayed in Airflow, but rather that it now produces two log lines instead of one.
When observing both the CloudWatch log streams and the Airflow UI, we can see duplicate log lines for
allmost log entries post upgrade, whilst seeing single log lines in tasks before upgrade.This happens both for tasks ran in a remote
EcsRunTaskOperator
's as well as in regularPythonOperator
's.What you think should happen instead
A single non-duplicate log line should be produced into CloudWatch.
How to reproduce
From my understanding now, any setup on 2.4.2 that uses CloudWatch remote logging will produce duplicate log lines. (But I have not been able to confirm other setups)
Operating System
Docker:
apache/airflow:2.4.2-python3.9
- Running on AWS ECS FargateVersions of Apache Airflow Providers
Deployment
Other Docker-based deployment
Deployment details
We are running a docker inside Fargate ECS on AWS.
The following environment variables + config in CloudFormation control remote logging:
Anything else
We did not change any other configuration during the upgrade, simply bumped the requirements for provider list + docker image from 2.4.1 to 2.4.2.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: