-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add optimize command in python binding #1313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite good. Added a few suggestions.
:return: the metrics from optimize | ||
""" | ||
metrics = self._table.optimize(partition_filters, target_size) | ||
return json.loads(metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this be typed for the sake of autocompletion, but doing that in Rust right now involves a bit of boilerplate. One lightweight option is https://docs.python.org/3.8/library/typing.html#typing.TypedDict, but that's Python>=3.8 only. We still have a month or two until 3.7 is EOL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like:
if TYPE_CHECKING and sys.version_info().minor >= 8:
from typing import TypedDict
class MetricDetails(TypeDict):
min: int
max: int
avg: float
total_files: int
total_size: int
class OptimizeMetrics(TypeDict):
num_files_added: int
num_files_removed: int
files_added: MetricDetails
files_removed: MetricDetails
partitions_optimized: int
num_batches: int
total_considered_files: int
total_files_skipped: int
preserve_insertion_order: bool
Then you can modify the signature:
def optimize(
self,
partition_filters: Optional[List[Tuple[str, str, Any]]] = None,
target_size: Optional[int] = None,
) -> "OptimizeMetrics":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm happy to leave this for a follow up later FYI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'll leave this in current PR.
can you guys please don't release the next version of python package before you merge this, this is a very important functionality |
Description
This is a implementation of the Optimize Command for python binding.
Related Issue(s)
#622