Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary cloning #428

Closed
zzril opened this issue Jul 6, 2023 · 1 comment · Fixed by #494
Closed

Unnecessary cloning #428

zzril opened this issue Jul 6, 2023 · 1 comment · Fixed by #494
Labels
cleanup 🧹 Refactorings and other tasks that improve the code performance 🏃 Speed things up question Further information is requested released Included in a release

Comments

@zzril
Copy link
Contributor

zzril commented Jul 6, 2023

Is your feature request related to a problem?

There seem to be some unnecessary cases of copying a table in _table.py.

For example, keep_only_columns creates a copy of the table, then calls remove_columns(...) on that copy and returns the result:

Implementation keep_only_columns(...):

    def keep_only_columns(self, column_names: list[str]) -> Table:
        ...
        clone = self._copy()
        clone = clone.remove_columns(list(set(self.column_names) - set(column_names)))
        return clone

However, remove_columns itself already has to - in accordance to our guidelines - create a copy.
Indeed, it does (although very implicitely, by calling Pandas' drop method with the default inplace=false keyword argument):

Implementation of remove_columns(...):

    def remove_columns(self, column_names: list[str]) -> Table:
       ...
        transformed_data = self._data.drop(labels=column_names, axis="columns")
        transformed_data.columns = [name for name in self._schema.column_names if name not in column_names]
        ...
        return Table._from_pandas_dataframe(transformed_data)

Pandas documentation for drop method:

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Since remove_columns already correctly returns a copy of the table it was called on, it is not clear to me why we need to make an additional copy within out keep_only_columns implementation.
Am I misunderstanding something?

Desired solution

If these additional copies indeed turn out to be unnecessary: Find all occurences of them and remove them,

Possible alternatives (optional)

If the copies actually turn out to be necessary: Close this issue.

@zzril zzril added question Further information is requested performance 🏃 Speed things up cleanup 🧹 Refactorings and other tasks that improve the code labels Jul 6, 2023
@github-project-automation github-project-automation bot moved this to Backlog in Library Jul 6, 2023
lars-reimann added a commit that referenced this issue Nov 16, 2023
Closes #428

### Summary of Changes

* Enable copy-on-write for Pandas dataframes. This will be enabled in
Pandas v3 by default.
* Remove unnecessary cloning.

---------

Co-authored-by: megalinter-bot <[email protected]>
@github-project-automation github-project-automation bot moved this from Backlog to ✔️ Done in Library Nov 16, 2023
lars-reimann pushed a commit that referenced this issue Nov 22, 2023
## [0.16.0](v0.15.0...v0.16.0) (2023-11-22)

### Features

* drop Python 3.10 and add Python 3.12 ([#478](#478)) ([5bf0e75](5bf0e75))
* enable copy-on-write for pandas dataframes ([#494](#494)) ([9a19389](9a19389)), closes [#428](#428)

### Bug Fixes

* **deps-dev:** Bump urllib3 from 1.26.17 to 1.26.18 ([#480](#480)) ([a592d2c](a592d2c)), closes [#3159](https://github.com/Safe-DS/Library/issues/3159)

### Performance Improvements

* remove unneeded copy operations in table transformers ([#496](#496)) ([6443beb](6443beb)), closes [#494](#494)
@lars-reimann
Copy link
Member

🎉 This issue has been resolved in version 0.16.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 Refactorings and other tasks that improve the code performance 🏃 Speed things up question Further information is requested released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants