Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaged DAGs not getting loaded in Airflow 2.6.0 (ValueError: source code string cannot contain null bytes) #31039

Closed
1 of 2 tasks
harshith-bolar-rapido opened this issue May 3, 2023 · 3 comments · Fixed by #31061
Labels
affected_version:2.6 Issues Reported for 2.6 area:core kind:bug This is a clearly a bug
Milestone

Comments

@harshith-bolar-rapido
Copy link

harshith-bolar-rapido commented May 3, 2023

Apache Airflow version

2.6.0

What happened

I am trying to upgrade Airflow from version 2.3.1 to 2.6.0. I have a zip file with a few test DAGs which used to get loaded correctly in 2.3.1 but after the upgrade I get the following error in the scheduler logs.

Process ForkProcess-609:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py", line 259, in _run_processor_manager
    processor_manager.start()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py", line 493, in start
    return self._run_parsing_loop()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py", line 572, in _run_parsing_loop
    self.start_new_processes()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py", line 1092, in start_new_processes
    processor.start()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/processor.py", line 196, in start
    for module in iter_airflow_imports(self.file_path):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/file.py", line 389, in iter_airflow_imports
    parsed = ast.parse(Path(file_path).read_bytes())
  File "/usr/local/lib/python3.7/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
ValueError: source code string cannot contain null bytes

Single .py files are getting loaded without issues.

What you think should happen instead

Packaged file should be parsed and the DAGs inside it should be available in the DagBag.

How to reproduce

  • Setup Airflow 2.6.0 using the official Docker image and helm chart.
  • Create a folder and place the python file below inside it.
  • Create a packaged DAG using command zip -r test_dags.zip ./* from within the folder.
  • Place the test_dags.zip file in /opt/airflow/dags directory.
import time
from datetime import timedelta
from textwrap import dedent
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=1),
}

def test_function(task_name):
    print(f"{task_name}: Test function invoked")
    print(f"{task_name}: Sleeping 10 seconds")
    time.sleep(10)
    print(f"{task_name}: Exiting")

with DAG(
    'airflow2_test_dag_1',
    default_args=default_args,
    description='DAG for testing airflow 2.0',
    schedule_interval=timedelta(days=1),
    start_date=days_ago(1)
) as dag:
    t1 = PythonOperator(
        task_id='first_task',
        python_callable=test_function,
        op_kwargs={"task_name": "first_task"},
        dag=dag,
    )
    t2 = PythonOperator(
        task_id='second_task',
        python_callable=test_function,
        op_kwargs={"task_name": "second_task"},
        dag=dag,

    )
    t3 = PythonOperator(
        task_id='third_task',
        python_callable=test_function,
        op_kwargs={"task_name": "third_task"},
        dag=dag,
        queue='kubernetes'
    )

    t1 >> t2 >> t3

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==7.3.0
apache-airflow-providers-apache-hive==2.3.2
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==5.2.2
apache-airflow-providers-common-sql==1.3.4
apache-airflow-providers-docker==3.5.1
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==6.8.0
apache-airflow-providers-grpc==3.1.0
apache-airflow-providers-hashicorp==2.2.0
apache-airflow-providers-http==4.2.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-microsoft-azure==5.2.1
apache-airflow-providers-mysql==4.0.2
apache-airflow-providers-odbc==3.2.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sftp==4.2.4
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-snowflake==4.0.4
apache-airflow-providers-sqlite==3.3.1
apache-airflow-providers-ssh==3.5.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

The issue only occurs with 2.6.0. I used the 2.5.3 docker image with everything else remaining same and packaged DAGs loaded with no issues.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@harshith-bolar-rapido harshith-bolar-rapido added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 3, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 3, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@luos-fc
Copy link
Contributor

luos-fc commented May 3, 2023

Experiencing the same - seems to be related to #30495. I find DAGs parse correctly once parsing_pre_import_modules: False is set in my scheduler config.

@potiuk potiuk added this to the Airflow 2.6.1 milestone May 3, 2023
@potiuk potiuk removed the needs-triage label for new issues that we didn't triage yet label May 3, 2023
@ashb
Copy link
Member

ashb commented May 4, 2023

Confirmed, yeah #30495 causes problems with zipped dags.

Temporary work around for now is to set AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES=False -- we'll get this fixed in 2.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.6 Issues Reported for 2.6 area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants