Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more filter to s3 hook list_key #22231

Merged
merged 1 commit into from
Mar 15, 2022
Merged

Conversation

sunank200
Copy link
Collaborator

Implemented as discussed in closed PR.

Add more filter options to list_keys of S3Hook

  • start_after_key: should return only keys greater than this key
  • from_datetime: should return only keys with LastModified attr greater than this equal from_datetime.
  • to_datetime: should return only keys with LastModified attr less than this to_datetime.
  • object_filter: Function callable that receives the list of the S3 objects, from_datetime and to_datetime and returns the List of the matched key.

Add test for the added argument to list_keys.

closes: #16627


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Mar 13, 2022
@sunank200 sunank200 force-pushed the s3-list-key-filter branch from 5bbb8a6 to cd8ccaa Compare March 13, 2022 23:51
airflow/providers/amazon/aws/hooks/s3.py Outdated Show resolved Hide resolved
Comment on lines 285 to 286
from_datetime: Optional[DateTime] = None,
to_datetime: Optional[DateTime] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this needs to take pendulum.DateTime. Normal datetime.datetime works equally well (and is compatible with pendulum.DateTime).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LastModified in the key returned by boto3 is timezone aware and comparison with DateTime specified by the user would create TypeError: can't compare offset-naive and offset-aware datetimes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native datetime can be timezone-aware as well; on the other hand, using pendulum.DateTime still does not guarantee the instance is timezone-aware.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Changed the type to datetime.datetime.

@sunank200 sunank200 requested a review from uranusjr March 14, 2022 08:22
@sunank200 sunank200 changed the title S3 list key filter Add more filter to s3 hook list_key Mar 14, 2022
@kaxil kaxil closed this Mar 14, 2022
@kaxil kaxil reopened this Mar 14, 2022
@kaxil
Copy link
Member

kaxil commented Mar 14, 2022

Re-triggering the build to rerun doc build

@potiuk
Copy link
Member

potiuk commented Mar 14, 2022

Seems like this is a wider outage of some inventories :(

@potiuk potiuk closed this Mar 15, 2022
@potiuk potiuk reopened this Mar 15, 2022
@potiuk
Copy link
Member

potiuk commented Mar 15, 2022

I think you will need to rebase that one @sunank200

…time, object_filter callable

Implemented as discussed in [closed PR](apache#19018).

Add more filter options to list_keys of S3Hook
- `start_after_key`: should return only keys greater than this key
- `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`.
- `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`.
- `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key.

Add test for the added argument to `list_keys`.

closes: apache#16627
@sunank200 sunank200 force-pushed the s3-list-key-filter branch from 6bfdf44 to 9707a1a Compare March 15, 2022 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add more filter options to list_keys of S3Hook
4 participants