Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to airflow 2.8.1 with kubernetes provider 7.13.0 broke scheduler #37008

Closed
1 of 2 tasks
jmaicher opened this issue Jan 25, 2024 · 7 comments
Closed
1 of 2 tasks
Labels
affected_version:2.8 Issues Reported for 2.8 area:providers kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@jmaicher
Copy link
Contributor

Apache Airflow Provider(s)

cncf-kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==7.13.0
kubernetes==23.6.0

Apache Airflow version

2.8.1

Operating System

debian 12 (bookworm)

Deployment

Official Apache Airflow Helm Chart

Deployment details

Official helm chart with official airflow image (2.8.1)

What happened

When upgrading from 2.8.0 to 2.8.1 the airflow scheduler ended in a CrashLoopBackOff with:


  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2024-01-25T09:52:55.993+0000] {task_context_logger.py:63} INFO - Task context logging is enabled
[2024-01-25T09:52:55.993+0000] {executor_loader.py:115} INFO - Loaded executor: KubernetesExecutor
[2024-01-25T09:52:56.067+0000] {scheduler_job_runner.py:808} INFO - Starting the scheduler
[2024-01-25T09:52:56.068+0000] {scheduler_job_runner.py:815} INFO - Processing each file at most -1 times
[2024-01-25T09:52:56.069+0000] {kubernetes_executor.py:316} INFO - Start Kubernetes executor
[2024-01-25T09:52:56.106+0000] {kubernetes_executor_utils.py:157} INFO - Event: and now my watch begins starting at resource_version: 0
[2024-01-25T09:52:56.114+0000] {kubernetes_executor.py:237} INFO - Found 0 queued task instances
[2024-01-25T09:52:56.124+0000] {manager.py:169} INFO - Launched DagFileProcessorManager with pid: 108
[2024-01-25T09:52:56.125+0000] {scheduler_job_runner.py:1619} INFO - Adopting or resetting orphaned tasks for active dag runs
[2024-01-25T09:52:56.133+0000] {settings.py:60} INFO - Configured default timezone UTC
[2024-01-25T09:52:56.149+0000] {settings.py:533} INFO - Loaded airflow_local_settings from /opt/airflow/config/airflow_local_settings.py .
[2024-01-25T09:52:56.196+0000] {scheduler_job_runner.py:872} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job_runner.py", line 855, in _execute
    self._run_scheduler_loop()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job_runner.py", line 937, in _run_scheduler_loop
    self.adopt_or_reset_orphaned_tasks()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 79, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job_runner.py", line 1622, in adopt_or_reset_orphaned_tasks
    for attempt in run_with_db_retries(logger=self.log):
  File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 347, in __iter__
    do = self.iter(retry_state=retry_state)
  File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job_runner.py", line 1667, in adopt_or_reset_orphaned_tasks
    to_reset = self.job.executor.try_adopt_task_instances(tis_to_adopt_or_reset)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py", line 587, in try_adopt_task_instances
    self._adopt_completed_pods(kube_client)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py", line 687, in _adopt_completed_pods
    pod_list = self._list_pods(query_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py", line 171, in _list_pods
    pod_resource = dynamic_client.resources.get(api_version="v1", kind="Pod")
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 199, in get
    results = self.search(**kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 242, in search
    results = self.__search(self.__build_search(**kwargs), self.__resources, [])
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 290, in __search
    matches.extend(self.__search([key] + parts[1:], resources, reqParams))
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 276, in __search
    return self.__search(parts[1:], resourcePart, reqParams + [part] )
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 290, in __search
    matches.extend(self.__search([key] + parts[1:], resources, reqParams))
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 276, in __search
    return self.__search(parts[1:], resourcePart, reqParams + [part] )
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 265, in __search
    resourcePart.resources = self.get_resources_for_api_version(
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 169, in get_resources_for_api_version
    resource, name = subresource['name'].split('/')
ValueError: too many values to unpack (expected 2)

What you think should happen instead

No response

How to reproduce

Deployment on a kubernetes cluster with a sub-resource that is named following the a/b/c naming scheme.

Anything else

The problem was likely introduced with b9c574c and the bug has been already fixed in the python kubernetes client, see kubernetes-client/python#2091. An upgrade of the kubernetes client is already planned (#36678), but we wanted to make this issue visible as it likely breaks other deployments as well.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jmaicher jmaicher added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jan 25, 2024
Copy link

boring-cyborg bot commented Jan 25, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dirrao dirrao added provider:cncf-kubernetes Kubernetes provider related issues affected_version:2.8 Issues Reported for 2.8 and removed needs-triage label for new issues that we didn't triage yet labels Jan 25, 2024
@dirrao
Copy link
Contributor

dirrao commented Jan 25, 2024

@jmaicher Thanks for sharing the detailed information. This is a bug. This will trigger Kubernetes version upgrade.
Meanwhile you can try lowering the Kubernetes version and see if that fix your issue. Or You can install provide cncf Kubernetes version 7.11.0.

@dirrao dirrao added the priority:high High priority bug that should be patched quickly but does not require immediate new release label Jan 25, 2024
@dirrao
Copy link
Contributor

dirrao commented Jan 25, 2024

FYI @hussein-awala / @potiuk

@dirrao dirrao removed the priority:high High priority bug that should be patched quickly but does not require immediate new release label Jan 25, 2024
@potiuk
Copy link
Member

potiuk commented Jan 25, 2024

thanks @dirrao . It's not as "super" high priority as I understand, simply bumping mi vesion fr the k8s client from #36684 solves it - correct @jmaicher ? Do I understand it correctly ?

If so we can just make sure we get the #36684 merged quickly and possibly release an ad-hoc new k8s provider - cc: @eladkal @raphaelauv ?

@dirrao
Copy link
Contributor

dirrao commented Feb 8, 2024

This fix #37040 is most likely available in the next cncf kubernetes provider release.

@jmaicher
Copy link
Contributor Author

The kubernetes version bump (#37040) has been released with apache-airflow-providers-cncf-kubernetes:8.0.0 and is included in airflow:2.8.2. Thanks for the support, this issue can be closed.

@potiuk potiuk closed this as completed Feb 28, 2024
@potiuk
Copy link
Member

potiuk commented Feb 28, 2024

Thanks for the information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.8 Issues Reported for 2.8 area:providers kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

3 participants