Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task_acks_late configuration to Celery Executor #37066

Merged
merged 17 commits into from
Feb 5, 2024

Conversation

ninsbl
Copy link
Contributor

@ninsbl ninsbl commented Jan 29, 2024

Celery task_acks_late is currently hard-coded to be deactivated. However, this causes issues when tasks exceed the visibility_timeout.

This PR, adds a task_acks_late parameter to the Celery configuration. Default values remain unchanged, but with the new config options users can at least try, if this setting solves there issue if they encounter the 'Task Instance Not Running' FAILED: Task is in the running state error message.

It is an enhanced re-implementation of #31829 and adresses #16163.

closes #16163
related: #31829

This is not covered by tests (yet). I can add tests but would be happy to receive pointers to e.g. existing test for environment settings, that I could adjust.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can through the following page for writing and running the tests.
https://github.com/apache/airflow/blob/main/contributing-docs/09_testing.rst

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test?

airflow/providers/celery/executors/default_celery.py Outdated Show resolved Hide resolved
airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
@hussein-awala
Copy link
Member

I just checked the conversation in the original issue, and personally I prefer to keep the default value to True and waits for community feedback before deciding if we need to change it to False. Let's wait for others reviews.

@ninsbl
Copy link
Contributor Author

ninsbl commented Jan 30, 2024

I just checked the conversation in the original issue, and personally I prefer to keep the default value to True and waits for community feedback before deciding if we need to change it to False. Let's wait for others reviews.

Sure, False as default was a glitch. It makes a lot of sense to keep the current default.

I am happy to add tests. What are optional and what are required tests? E.g. I have no experience with Kubernetes and Helm charts. So owuld for example be a unit test that checks if the variable is set properly be enough? With the Kubernetes stuff I may need assistance to get that merge-ready soonish...

@hussein-awala
Copy link
Member

I am happy to add tests. What are optional and what are required tests? E.g. I have no experience with Kubernetes and Helm charts. So owuld for example be a unit test that checks if the variable is set properly be enough? With the Kubernetes stuff I may need assistance to get that merge-ready soonish...

You can add a small test to check if the conf is taken into account by Airflow, you can use this test as a reference:

@conf_vars({("celery_broker_transport_options", "sentinel_kwargs"): '{"service_name": "mymaster"}'})
def test_sentinel_kwargs_loaded_from_string():
import importlib
# reload celery conf to apply the new config
importlib.reload(default_celery)
assert default_celery.DEFAULT_CELERY_CONFIG["broker_transport_options"]["sentinel_kwargs"] == {
"service_name": "mymaster"
}

@ninsbl
Copy link
Contributor Author

ninsbl commented Jan 30, 2024

Tried to add a basic unittest in b9be045. Checked it with breeze and it seems to work...

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
@ninsbl
Copy link
Contributor Author

ninsbl commented Jan 31, 2024

BTW.: When running tests with breeze (on WSL2) I get the following error message:

23.16 INFO: pip is looking at multiple versions of apache-airflow[amazon] to determine which version is compatible with other requirements. This could take a while.
23.31 ERROR: Cannot install apache-airflow[amazon]==2.9.0.dev0 because these package versions have conflicting dependencies.
23.31
23.31 The conflict is caused by:
23.31     apache-airflow[amazon] 2.9.0.dev0 depends on moto<5.0.0 and >=4.2.12; extra == "amazon"
23.31     The user requested (constraint) moto==5.0.0
23.31
23.31 To fix this you could try to:
23.31 1. loosen the range of package versions you've specified
23.31 2. remove package versions to allow pip attempt to solve the dependency conflict
23.31
23.31 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
------
Dockerfile.ci:1190
--------------------
 1188 |     # But in cron job we will install latest versions matching pyproject.toml to see if there is no breaking change
 1189 |     # and push the constraints if everything is successful
 1190 | >>> RUN bash /scripts/docker/install_airflow.sh
 1191 |
 1192 |     COPY --from=scripts entrypoint_ci.sh /entrypoint
--------------------
ERROR: failed to solve: process "/bin/bash -o pipefail -o errexit -o nounset -o nolog -c bash /scripts/docker/install_airflow.sh" did not complete successfully: exit code: 1

This does not prevent unittests from being executed though...

@potiuk
Copy link
Member

potiuk commented Jan 31, 2024

BTW.: When running tests with breeze (on WSL2) I get the following error message:

This does not prevent unittests from being executed though...

Yes. Yu need to rebase and rerun it to fix it. Breeze /tests works for sure in latest main. Alternatively (but this is something breeze will always print as warning in this case) there are times before merging conflicting changes to main and refreshing constraints where you need to run build with --upgrade-to-newer-dependencies.

Actually I think I wil make a small change to handle this case better because we already have feature to retry build with --upgrade-to-newer-dependencies when it fails with constraints , so I can flip the flag to run this mode locally by default

@potiuk potiuk force-pushed the celery_task_acks_late branch from 50420ee to 1d9aa5e Compare January 31, 2024 09:04
@potiuk
Copy link
Member

potiuk commented Jan 31, 2024

I rebased this one and added PR to make future local build work in case we have not managed to refresh the constraints yet #37116 - that will take longer but will finally succeed and should close the loophole here.

@ninsbl
Copy link
Contributor Author

ninsbl commented Jan 31, 2024

Yes. Yu need to rebase and rerun it to fix it. Breeze /tests works for sure in latest main.

Thanks, @potiuk , I can confirm that breeze now builds and runs fine after rebase...

@ninsbl
Copy link
Contributor Author

ninsbl commented Feb 1, 2024

Anything I can do to move this forward?

@ninsbl
Copy link
Contributor Author

ninsbl commented Feb 1, 2024

Hmm, it seems the spell checking does ot like the new key name:

(Incorrect Spelling: 'acks' Line with Error: 'task_acks_late')...

@ninsbl
Copy link
Contributor Author

ninsbl commented Feb 3, 2024

Hope adding the exception for acks in 8419527 was correct resolution for the failing spell check test ... ?

@ninsbl ninsbl requested review from dirrao and o-nikolas February 3, 2024 11:29
airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
@hussein-awala hussein-awala changed the title Celery provider: Add task_acks_late setting Add task_acks_late configuration to Celery Executor Feb 5, 2024
@hussein-awala hussein-awala merged commit 6c72223 into apache:main Feb 5, 2024
56 checks passed
Copy link

boring-cyborg bot commented Feb 5, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@hussein-awala
Copy link
Member

Congrats on your first commit 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants