Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add optimize command in python binding #1313

Merged
merged 6 commits into from
May 4, 2023

Conversation

loleek
Copy link
Contributor

@loleek loleek commented Apr 27, 2023

Description

This is a implementation of the Optimize Command for python binding.

Related Issue(s)

#622

@github-actions github-actions bot added the binding/python Issues for the Python package label Apr 27, 2023
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 27, 2023
@loleek loleek changed the title issue-622: add optimize command in python binding feat: add optimize command in python binding Apr 27, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good. Added a few suggestions.

python/deltalake/table.py Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
:return: the metrics from optimize
"""
metrics = self._table.optimize(partition_filters, target_size)
return json.loads(metrics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this be typed for the sake of autocompletion, but doing that in Rust right now involves a bit of boilerplate. One lightweight option is https://docs.python.org/3.8/library/typing.html#typing.TypedDict, but that's Python>=3.8 only. We still have a month or two until 3.7 is EOL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like:

if TYPE_CHECKING and sys.version_info().minor >= 8:
    from typing import TypedDict

    class MetricDetails(TypeDict):
        min: int
        max: int
        avg: float
        total_files: int
        total_size: int
    
    class OptimizeMetrics(TypeDict):
        num_files_added: int
        num_files_removed: int
        files_added: MetricDetails
        files_removed: MetricDetails
        partitions_optimized: int
        num_batches: int
        total_considered_files: int
        total_files_skipped: int
        preserve_insertion_order: bool

Then you can modify the signature:

def optimize(
            self,
            partition_filters: Optional[List[Tuple[str, str, Any]]] = None,
            target_size: Optional[int] = None,
    ) -> "OptimizeMetrics":

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm happy to leave this for a follow up later FYI)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'll leave this in current PR.

python/docs/source/usage.rst Outdated Show resolved Hide resolved
@djouallah
Copy link

can you guys please don't release the next version of python package before you merge this, this is a very important functionality

@loleek loleek requested a review from wjones127 May 4, 2023 09:21
@wjones127 wjones127 merged commit fdb5e7b into delta-io:main May 4, 2023
@loleek loleek deleted the optimize-op branch May 5, 2023 02:35
@loleek loleek restored the optimize-op branch May 5, 2023 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants