Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 hooks filter options #19018

Closed
wants to merge 12 commits into from
Closed

S3 hooks filter options #19018

wants to merge 12 commits into from

Conversation

sunank200
Copy link
Collaborator

@sunank200 sunank200 commented Oct 15, 2021

Add more filter options to list_keys of S3Hook

This commit adds following filters to list the keys in list_keys of S3Hook:
- start_after_key filters the any keys after the specified key. start_after_key can be any key in the bucket.
- start_after_datetime filters all the keys with last modified date-time greater than or equal to the start_after_datetime.
- to_datetime filters all the keys with last modified date-time less than or equal to the to_datetime

Implemented as per changes discussed in the previous PR.

closes: #16627

This change wouldn't affect dependencies for other operators like S3DeleteObjectsOperator, S3ListOperator, S3Hook methods:get_wildcard_key, delete_bucket and S3KeysUnchangedSensor.

Corresponding unittest has been added to test_s3.py.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

This commit adds following filters to list the keys in list_keys of S3Hook:
- start_after_key filters the any keys after the specified key. start_after_key can be any key in the bucket.
- start_after_datetime filters all the keys with last modified datetime greater than or equal to the start_after_datetime.
- to_datetime filters all the keys with last modified datetime less than or equal to the to_datetime
Tests to filter keys after specified key and to filter keys based on last modified datetime.
… object_filter callable.

Response filter to filter objects based on operations defined by user. It is written in a generic way such that it can be extended in future for different operations.
… object_filter callable.

Response filter to filter objects based on operations defined by user. It is written in a generic way such that it can be extended in future for different operations.
Fixes for Flake8 error and indentation bug fix to return result when object_filter is None
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Oct 15, 2021
@sunank200
Copy link
Collaborator Author

sunank200 commented Oct 15, 2021

@potiuk @dstandish @eladkal could you trigger the CI?

@eladkal eladkal requested a review from dstandish October 26, 2021 20:26
Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we need to add the ResponseFilter abstraction.

What do you think about allowing the user to provide an arbitrary callable that does the filtering that the user needs?

For example, your docstring shows this example:

object_filter={"LastModified__lt": datetime.now()},

But with arbitrary callable, this can be implemented about as simply this way:

object_filter=lambda x: x['LastModified'] < datetime.now(),

This is roughly as compact and roughly as simple, but the benefit is there's no need to understand the options available within the class and how it works, and if user wants to do arbitrary filtering they can.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 12, 2021
@github-actions github-actions bot closed this Dec 18, 2021
sunank200 added a commit to astronomer/airflow that referenced this pull request Mar 15, 2022
…time, object_filter callable

Implemented as discussed in [closed PR](apache#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: apache#16627
kaxil pushed a commit that referenced this pull request Mar 15, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 30, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 12, 2024
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 18, 2024
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 7, 2024
…time, object_filter callable (#22231)

Implemented as discussed in [closed PR](apache/airflow#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: #16627
GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add more filter options to list_keys of S3Hook
2 participants