-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 hooks filter options #19018
S3 hooks filter options #19018
Conversation
This commit adds following filters to list the keys in list_keys of S3Hook: - start_after_key filters the any keys after the specified key. start_after_key can be any key in the bucket. - start_after_datetime filters all the keys with last modified datetime greater than or equal to the start_after_datetime. - to_datetime filters all the keys with last modified datetime less than or equal to the to_datetime
Tests to filter keys after specified key and to filter keys based on last modified datetime.
… object_filter callable. Response filter to filter objects based on operations defined by user. It is written in a generic way such that it can be extended in future for different operations.
… object_filter callable. Response filter to filter objects based on operations defined by user. It is written in a generic way such that it can be extended in future for different operations.
Fixes for Flake8 error and indentation bug fix to return result when object_filter is None
This reverts commit f2f4cbc.
@potiuk @dstandish @eladkal could you trigger the CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure we need to add the ResponseFilter abstraction.
What do you think about allowing the user to provide an arbitrary callable that does the filtering that the user needs?
For example, your docstring shows this example:
object_filter={"LastModified__lt": datetime.now()},
But with arbitrary callable, this can be implemented about as simply this way:
object_filter=lambda x: x['LastModified'] < datetime.now(),
This is roughly as compact and roughly as simple, but the benefit is there's no need to understand the options available within the class and how it works, and if user wants to do arbitrary filtering they can.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
…time, object_filter callable Implemented as discussed in [closed PR](apache#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: apache#16627
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
…time, object_filter callable (#22231) Implemented as discussed in [closed PR](apache/airflow#19018). Add more filter options to list_keys of S3Hook - `start_after_key`: should return only keys greater than this key - `from_datetime`: should return only keys with LastModified attr greater than this equal `from_datetime`. - `to_datetime`: should return only keys with LastModified attr less than this `to_datetime`. - `object_filter`: Function callable that receives the list of the S3 objects, `from_datetime` and `to_datetime` and returns the List of the matched key. Add test for the added argument to `list_keys`. closes: #16627 GitOrigin-RevId: 926f6d1894ce9d097ef2256d14a99968638da9c0
Add more filter options to list_keys of S3Hook
This commit adds following filters to list the keys in list_keys of S3Hook:
-
start_after_key
filters the any keys after the specified key. start_after_key can be any key in the bucket.-
start_after_datetime
filters all the keys with last modified date-time greater than or equal to the start_after_datetime.-
to_datetime
filters all the keys with last modified date-time less than or equal to the to_datetimeImplemented as per changes discussed in the previous PR.
closes: #16627
This change wouldn't affect dependencies for other operators like S3DeleteObjectsOperator, S3ListOperator, S3Hook methods:get_wildcard_key, delete_bucket and S3KeysUnchangedSensor.
Corresponding unittest has been added to test_s3.py.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.