Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQueryValueCheckOperator doesn't respect pass_value in deferrable mode #34010

Closed
1 of 2 tasks
nathadfield opened this issue Sep 1, 2023 · 0 comments · Fixed by #34018
Closed
1 of 2 tasks

BigQueryValueCheckOperator doesn't respect pass_value in deferrable mode #34010

nathadfield opened this issue Sep 1, 2023 · 0 comments · Fixed by #34018
Assignees
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@nathadfield
Copy link
Collaborator

Apache Airflow version

2.7.0

What happened

When running BigQueryValueCheckOperator in deferrable mode, the operator always reports a successful status even if the pass_value has not been met.

What you think should happen instead

If the value returned by the SQL given to the operator does not equal the pass_value then the operator should fail. This occurs when deferrable=False but not when it is True.

How to reproduce

The following DAG code should replicate the issue. Both tasks provide some SQL that just returns false and with a pass_value of True. The only difference is the fact that the first task is running in deferrable mode.

from datetime import datetime

from airflow import models
from airflow.providers.google.cloud.operators.bigquery import BigQueryValueCheckOperator

with models.DAG(
    dag_id='bq_value_check',
    start_date=datetime(2023, 8, 31),
    catchup=False,
    schedule='0 0 * * *',
) as dag:

    test1 = BigQueryValueCheckOperator(
        task_id=f'test1',
        sql=f'SELECT false;',
        pass_value=True,
        retries=0,
        deferrable=False,
    )

    test2 = BigQueryValueCheckOperator(
        task_id=f'test2',
        sql=f'SELECT false;',
        pass_value=True,
        retries=0,
        deferrable=True,
    )
Screenshot 2023-09-01 at 14 42 28

Some log extracts:
test1

[2023-09-01, 13:34:53 UTC] {bigquery.py:1596} INFO - Inserting job airflow_1693575293576477_43573f106bd19562ebd18f0679e80536
[2023-09-01, 13:34:54 UTC] {taskinstance.py:1824} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 425, in execute
    super().execute(context=context)
  File "/usr/local/lib/python3.10/site-packages/airflow/providers/common/sql/operators/sql.py", line 857, in execute
    self._raise_exception(error_msg)
  File "/usr/local/lib/python3.10/site-packages/airflow/providers/common/sql/operators/sql.py", line 187, in _raise_exception
    raise AirflowException(exception_string)
airflow.exceptions.AirflowException: Test failed.
Pass value:True
Tolerance:None
Query:
SELECT false;
Results:
[False]

test2

[2023-09-01, 13:34:53 UTC] {bigquery.py:1596} INFO - Inserting job airflow_1693575293256612_92474edc414865ab2efb14bd8b18e24d
[2023-09-01, 13:34:53 UTC] {bigquery.py:446} INFO - Current state of job airflow_1693575293256612_92474edc414865ab2efb14bd8b18e24d is DONE
[2023-09-01, 13:34:53 UTC] {taskinstance.py:1345} INFO - Marking task as SUCCESS. dag_id=bq_value_check, task_id=test2, execution_date=20230831T000000, start_date=20230901T133452, end_date=20230901T133453
[2023-09-01, 13:34:53 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 0

Operating System

n/a

Versions of Apache Airflow Providers

apache-airflow-providers-google==10.7.0

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@nathadfield nathadfield added kind:bug This is a clearly a bug area:core needs-triage label for new issues that we didn't triage yet provider:google Google (including GCP) related issues area:providers and removed area:core labels Sep 1, 2023
@eladkal eladkal added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Sep 1, 2023
@pankajkoti pankajkoti self-assigned this Sep 1, 2023
pankajkoti added a commit to astronomer/airflow that referenced this issue Sep 1, 2023
PR apache#31872 tried to optimise the deferrable mode in BigQueryValueCheckOperator.
However for deciding on whether to defer it just checked the
job status but did not actually verified the passed value
to check for and returned a success prematurely.
This PR adds on the missing logic with the optimisation to check
and compare the pass value and tolerations.

closes: apache#34010
potiuk pushed a commit that referenced this issue Sep 3, 2023
PR #31872 tried to optimise the deferrable mode in BigQueryValueCheckOperator.
However for deciding on whether to defer it just checked the
job status but did not actually verified the passed value
to check for and returned a success prematurely.
This PR adds on the missing logic with the optimisation to check
and compare the pass value and tolerations.

closes: #34010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants