You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Given two arbitrary column_views, I would like to be able to know in constant time if the two objects are views of identical logical columns.
Describe the solution you'd like
In short, I want a is_shallow_equal(column_view, column_view) (name TBD) that returns true if the two views reference the exact same data (i.e., their data/null_mask pointers are equal, they have equal offsets/size/type/num_children, and all of these things apply recursively to all descendants) Otherwise, returns false, even if the two columns view equivalent data.
Additional context
The primary motivation for this function is in the context of groupby where we maintain a cache of already computed aggregations. We would like to be able to unambiguously detect if we have already computed a particular aggregation for a given column. We'll likewise need a hash function that hashes all of the "shallow" state of a column_view.
For a fascinating look into Regularity and why std::span doesn't have an operator==, check out:
…view (#9185)
Fixes#9140
Added `shallow_hash(column_view)`
Added unit tests
It computes hash values based on the shallow states of `column_view`:
type, size, data pointer, null_mask pointer, offset, and the hash value of the children.
`null_count` is not used since it is a cached value and it may vary based on contents of `null_mask`, and may be pre-computed or not.
Fixes#9139
Added `is_shallow_equivalent(column_view, column_view)` ~shallow_equal~
Added unit tests
It compares two column_views based on the shallow states of column_view:
type, size, data pointer, null_mask pointer, offset, and the column_view of the children.
null_count is not used since it is a cached value and it may vary based on contents of null_mask, and may be pre-computed or not.
Authors:
- Karthikeyan (https://github.com/karthikeyann)
Approvers:
- Mark Harris (https://github.com/harrism)
- Vyas Ramasubramani (https://github.com/vyasr)
- Jake Hemstad (https://github.com/jrhemstad)
- David Wendt (https://github.com/davidwendt)
URL: #9185
…view (#9312)
Fixes#9140
Added `shallow_hash(column_view)`
Added unit tests
SWIPAT approval complete
It computes hash values based on the shallow states of `column_view`:
type, size, data pointer, null_mask pointer, offset, and the hash value of the children.
`null_count` is not used since it is a cached value and it may vary based on contents of `null_mask`, and may be pre-computed or not.
Fixes#9139
Added `is_shallow_equivalent(column_view, column_view)` ~shallow_equal~
Added unit tests
It compares two column_views based on the shallow states of column_view:
type, size, data pointer, null_mask pointer, offset, and the column_view of the children.
null_count is not used since it is a cached value and it may vary based on contents of null_mask, and may be pre-computed or not.
Authors:
- Karthikeyan (https://github.com/karthikeyann)
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
URL: #9312
Is your feature request related to a problem? Please describe.
Given two arbitrary
column_view
s, I would like to be able to know in constant time if the two objects are views of identical logical columns.Describe the solution you'd like
In short, I want a
is_shallow_equal(column_view, column_view)
(name TBD) that returnstrue
if the two views reference the exact same data (i.e., theirdata
/null_mask pointers are equal, they have equal offsets/size/type/num_children, and all of these things apply recursively to all descendants) Otherwise, returns false, even if the two columns view equivalent data.Additional context
The primary motivation for this function is in the context of groupby where we maintain a cache of already computed aggregations. We would like to be able to unambiguously detect if we have already computed a particular aggregation for a given column. We'll likewise need a hash function that hashes all of the "shallow" state of a
column_view
.For a fascinating look into Regularity and why
std::span
doesn't have anoperator==
, check out:https://abseil.io/blog/20180531-regular-types
https://stackoverflow.com/questions/60633668/why-does-stdspan-lack-the-comparison-operators
The text was updated successfully, but these errors were encountered: