Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LocalFilesystemToGoogleDriveOperator #22219

Merged
merged 42 commits into from
Mar 21, 2022
Merged

Conversation

ulsc
Copy link
Contributor

@ulsc ulsc commented Mar 13, 2022

Operator to upload a list of files to a Google Drive folder, alongside a file which can be used to have more Google Drive operators.

@ulsc ulsc requested a review from turbaszek as a code owner March 13, 2022 14:52
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Mar 13, 2022
ulsc added 3 commits March 13, 2022 16:11
`api_version` parameter is passed through the hook and the hook already has the api_version default. One less magic number to manage.
Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise looks good to me 🚀

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Mar 14, 2022
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented Mar 14, 2022

Added it so we might add it to the new release of providers :)

Copy link
Contributor Author

@ulsc ulsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a problem in line 104: remote_location=str(self.drive_folder / local_path.name), since str doesn't have a name parameter.

@ulsc
Copy link
Contributor Author

ulsc commented Mar 14, 2022

drive_folder is also a template field, so to support jinja altogether, we need to change the path composition a bit.

@ulsc ulsc requested a review from mik-laj as a code owner March 15, 2022 21:39
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
will merge when CI is green

Comment on lines 123 to 124
except FileNotFoundError:
self.log.warning("File %s can't be found", local_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to be picky and catch all OSError but maybe that is just me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileNotFoundError seemed like the main error cause in my case, but it'd be better to catch all OSErrors. How about adding OSError after FileNotFoundError?

Copy link
Contributor

@eladkal eladkal Mar 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileNotFoundError is subclasses of OSError.
So catching OSError is enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal
Copy link
Contributor

eladkal commented Mar 20, 2022

CI indeed help us :)
example dag is missing for this operator
https://github.com/apache/airflow/runs/5617058321?check_suite_focus=true#step:8:892

          with self.subTest("Detect missing example dags"):
              missing_example = {s for s in operator_sets if not has_example_dag(s)}
              missing_example -= self.MISSING_EXAMPLE_DAGS
  >           assert set() == missing_example
  E           AssertionError: assert set() == {('suite', 'local_to_drive')}
  E             Extra items in the right set:
  E             ('suite', 'local_to_drive')
  E             Use -v to get the full diff

@ulsc ulsc requested a review from eladkal March 21, 2022 09:38
@eladkal eladkal changed the title Gdrive upload operator Add LocalFilesystemToGoogleDriveOperator Mar 21, 2022
@eladkal eladkal changed the title Add LocalFilesystemToGoogleDriveOperator Add LocalFilesystemToGoogleDriveOperator Mar 21, 2022
@eladkal eladkal merged commit 7b851ed into apache:main Mar 21, 2022
@ulsc ulsc deleted the gdrive-upload-operator branch March 21, 2022 14:25
Comment on lines +123 to +126
except FileNotFoundError:
self.log.warning("File can't be found: %s", local_path)
except OSError:
self.log.warning("An OSError occurred for file: %s", local_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged this one too fast :)
We should do something like:

        except FileNotFoundError as e:
            self.log.info('File not found: %s: %s', local_path, e)
            raise
        except OSError as e:
            self.log.info('An OSError occurred for file: %s: %s', local_path, e)
            raise

or

except OSError as e:
    raise AirflowException(
        f'Exception: There was exception related to file in path {local_path}  '
        f'full error is ({e}).'
    )

Would you mind open a follow up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise would fail the dag too early, however the intention was failing after seeing the whole landscape. That's why we logged the problems and raised as a result, if user doesn't want to ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants