Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Airflow Operator execution bugs in handling of Elyra-owned properties #2865

Merged
merged 5 commits into from
Aug 4, 2022

Conversation

kiersten-stokes
Copy link
Member

@kiersten-stokes kiersten-stokes commented Aug 1, 2022

This PR fixes 2 bugs related to volumes for Airflow operators:

  1. Brought to our attention in the elyra-ai/community gitter - The issue is that the mounted_volume property is being persisted as an empty list in the case where no values are added. This value is then not being popped off the dict of component_parameters for a given component before being passed to the DAG to render. The extra, unexpected kwarg is causing certain operators to fail.
airflow.exceptions.AirflowException: Invalid arguments were passed to PythonOperator (task_id: PythonOperator). Invalid arguments were:
**kwargs: {'mounted_volumes': []}
  1. Fixes Pipeline submission fails for Airflow due to parsing error #2866 - The issue here is that pipeline default properties such as runtime image and kubernetes secrets are being incorrectly propagated to custom components and appearing in the component_parameters dict. Due to formatting errors with the runtime_image field in the DAG, black is throwing an error. Even without this formatting error, however, these operators would not execute properly.

What changes were proposed in this pull request?

This PR adds more conditions around property propagation and persistence on the Operation object.

Re: issue 1 above, the mounted_volumes property is now appropriately popped off the dict of component_params for an Operation object. If the value of the mounted_volumes param is not None, is a list, and either 1. the list is empty or 2. the list consists of VolumeMount dataclass objects, the mounted_volumes key is popped off the dict. This means it will not be rendered alongside the component-defined parameters for which a value was supplied. Recall that we are accounting for the possibility that the component defined its own mounted_volumes parameter. If this is the case, and if that property happens to be list-valued and the value entered is an empty list, the consequences of popping this off the param dict should be minor.

Re: issue 2 above, any property that is 1. considered a pipeline default and 2. only applies to generic components is now appropriately skipped during property propagation for custom components. A list of such properties that adhere to the 2 conditions listed above already exists as the ELYRA_COMPONENT_PROPERTIES variable.

How was this pull request tested?

Manual testing shows that mounted_volumes no longer appears as an extra arg during operator construction and that runtime image, env vars, and secrets are not propagated to custom components..

  • Add backend test for empty mounted volumes on a custom component
    • see new test_pipeline_parser.py::test_custom_component_parsed_properties - this test takes a pipeline with 2 operators, one of which defines its own mounted_volumes property and one of which doesn't
      • the operator that defines it's own mounted_volumes property should have that component parameter in its component_params_as_dict attribute and it's value should be as-given in the pipeline file
      • the operator that does not define it's own mounted_volumes property and has no Elyra-owned mounted volumes given (and therefore has mounted_volumes: [] in the pipeline JSON) should not have the mounted_volumes property appear in its component_params_as_dict attribute
  • Add backend test for runtime image (or other pipeline default) to ensure it does not appear in the component_params dict for custom components
    • see additions to existing test_pipeline_definition.py::test_propagate_pipeline_default_properties - these additions ensure that properties such as runtime image and env vars are not propagated to custom components

Test resources have been updated according to the needs of the above tests. A small change to test_processor_airflow.py::test_create_file_custom_components was also required in order to compare a parsed pipeline file against the correct expected component parameters. I've confirmed that these new tests fail without the accompanying changes to the code base proposed in this PR.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@kiersten-stokes kiersten-stokes added this to the 3.10.2 milestone Aug 1, 2022
@elyra-bot
Copy link

elyra-bot bot commented Aug 1, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler ptitzler self-requested a review August 2, 2022 05:04
@ptitzler ptitzler added status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. kind:bug Something isn't working component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines labels Aug 2, 2022
Copy link
Member

@ptitzler ptitzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix is looking good. Also ran some manual tests to confirm that empty volume dependencies are not surfaced.

@kiersten-stokes kiersten-stokes changed the title Fix bug re: Airflow Operator execution when no mounted volumes are defined Fix Airflow Operator execution bugs in handling of Elyra-owned properties Aug 2, 2022
@ptitzler ptitzler self-requested a review August 3, 2022 05:27
@ptitzler

This comment was marked as resolved.

@kiersten-stokes
Copy link
Member Author

@ptitzler thanks for the continued review! Tests have now been added and TODO's marked completed 🙂

@kiersten-stokes kiersten-stokes removed the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Aug 3, 2022
Copy link
Member

@ptitzler ptitzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from comment about semantic inconsistency in one of the test pipelines.

@akchinSTC akchinSTC merged commit 334a3bb into elyra-ai:main Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines kind:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline submission fails for Airflow due to parsing error
3 participants