-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify dtype type in ColumnBuffers
elements
#87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this would be useful.
If my assumption is incorrect
Just as a sanity check I checked and pandas/vaex/cuDF/modin all return the type described in Column.dtype
for the get_buffers()
values. A. TODO I realize for dataframe-interchange-tests
is generalise test_dtype
and use it in test_get_buffers
.
Co-authored-by: Matthew Barber <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @flexatone, this looks correct to me and is a useful improvement. And thanks @honno for the review.
Also nice to learn about StaticFrame!
Note that this is actually not correct, depending on how you interpret it. Yes, the buffers' dtype returns a similar type of DType tuple, but it should not necessarily return the same dtype tuple as its It seems that we all interpreted this wrongly and all implementations got this wrong (or the text about "the data buffer's associated dtype" is wrong), see apache/arrow#37598, pandas-dev/pandas#54781, pola-rs/polars#10787 (and the same for StaticFrame mentioned above, from a quick look). |
Thanks for this team's efforts in implementing the dataframe interchange protocol.
While experimenting on an implementation for StaticFrame (static-frame/static-frame#621), I found one aspect of the ABC and associated types unclear.
The
ColumnBuffers
TypedDict
defines its elements as pairs ofBuffer
and a dtype, but only types the dtype asAny
. I assume that the dtype should be provided as the same tuple returned fromColumn.dtype
. This PR makes this explicit by defining aDtype
type alias and using it for the return value ofColumn.dtype
as well as the elements inColumnBuffers
.If my assumption is incorrect, it would be helpful to specify the type of the dtype values returned from
ColumnBuffers
.