Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ColumnAccessor methods; avoid unnecessary validations #14758

Merged
merged 18 commits into from
Jan 23, 2024

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jan 13, 2024

Description

For methods that essentially do

def select_by_foo(self, ...):
    ...
    return self.__class__(data={subset of self._data})

The return would perform validation on the returned subset of column, but I think that's unnecessary since that was done during initialization

Additionally

  • Removed _create_unsafe in favor of a verify=True|False keyword in the constructor
  • _column_length == nrows so removed _column_length
  • Renamed _compare_keys to _keys_equal
  • Remove seldom used/unnecessary methods

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 13, 2024
@mroeschke mroeschke requested a review from a team as a code owner January 13, 2024 01:42
@vyasr
Copy link
Contributor

vyasr commented Jan 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit 67a36a9 into rapidsai:branch-24.02 Jan 23, 2024
68 checks passed
@mroeschke mroeschke deleted the ref/column_accessor branch January 23, 2024 19:18
PointKernel pushed a commit to PointKernel/cudf that referenced this pull request Jan 25, 2024
…sai#14758)

For methods that essentially do

```python
def select_by_foo(self, ...):
    ...
    return self.__class__(data={subset of self._data})
```

The `return` would perform validation on the returned subset of column, but I think that's unnecessary since that was done during initialization

Additionally
* Removed `_create_unsafe` in favor of a `verify=True|False` keyword in the constructor
* `_column_length` == `nrows` so removed `_column_length`
* Renamed `_compare_keys` to `_keys_equal`
* Remove seldom used/unnecessary methods

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#14758
rapids-bot bot pushed a commit that referenced this pull request May 13, 2024
#14758 may have propagated a caching invalidation bug of the number of rows in a `ColumnAccessor`

Previously the number of rows was cached and cleared only if an operation caused the `ColumnAccessor` to have no more columns.

However, if the `ColumnAccessor` was empty and operation added new columns, the cached number of rows should have also been cleared.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #15710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants