-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding byte array view structure #11322
Adding byte array view structure #11322
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #11322 +/- ##
===============================================
Coverage ? 86.42%
===============================================
Files ? 143
Lines ? 22777
Branches ? 0
===============================================
Hits ? 19686
Misses ? 3091
Partials ? 0 Continue to review full report at Codecov.
|
The implementation generally looks fine, but I am curious about the scope of the utilization of this class. Where all do we expect it to be used? Will there be code paths that always assume |
No, it will be decided when writing a parquet column on a per-column basis.
This is all for parquet writing and not anything integral to cudf. In cudf it will always just be columns. I need this abstraction to help match up with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the idea. My only reservation with proceeding with this PR is whether we should make this a part of hierarchy, given that there's a lot of overlap between this and classes like device_span and string_view.
* statistics. Otherwise, it is a device_span in all but name. | ||
* | ||
*/ | ||
class byte_array_view { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should derive from device_span<uint8_t const>
to inherit the basic members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, feels like there could be a base template class for string_view and byte_array_view, data access and comparison semantics seem to be the same.
Maybe these can be future enhancements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this being derived and the pros/cons above. I debated that, but didn't want to give immediate access to all the members in span. I'm not against it and would encourage the debate and conversions that result, but would suggest a followup for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that conversation was here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this can be changed later. Mostly wanted to share the ideas (was not aware of the previous convo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the comparison operators and it seemed nice, but it isn't strictly required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but didn't want to give immediate access to all the members in span.
Would private
inheritance work, with some using
declarations to expose the methods we'd like to?
(I'm on the fence, though. I'm preconditioned to prefer composition over inheritance.)
Edit: I see @vuule's comment on the previous PR as well. Valid point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the comparison operators and it seemed nice, but it isn't strictly required.
You can always pass a custom comparison operator to an algorithm.
You should think very carefully about adding a whole new type vs just using a span
directly.
When you use a common vocabulary type like a span
, you know exactly what you are getting and what the behavior/contract of that type is. A new type incurs a bunch of cognitive overhead to learn all of its semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly does, but my concern from using a span was that I wanted to abstract away the internals of it and not require the user to know it is a span of uint8_t
, but they do need to know that for the data pointer, etc so this could be a moot point. I also liked composition for not blanket allowing all span accessors, but look over it again, I don't see anything that would be an example of something I wouldn't want to be available. I'm not against doing this, but I need to get moving on it as this is the foundation of #11303, #11160, and #11328, which are all 22.08 PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would private inheritance work, with some using declarations to expose the methods we'd like to?
This is nice. Less verbose, chances of logical mistakes will be less compared to composition. Since using
declaration inherits all overloads, it needs some thought. when overloads of span
changes or added, so does derived class overloads.
I hope, it's not an anti-pattern (@codereport)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, based on the conversations and changes in #11303.
It has been decided to be better to close this PR and pursue the option of |
I have a branch with the requested changes, but it is looking gnarly to me. I have it all compiling, but there are orc statistics errors that I haven't tracked down yet. I wanted to get opinions before I chase down this bug and merge this down. |
Not saying no to this version, but it seems like a step back to me. The comparators are distributed across the code instead of being in a single class to make a useful abstraction. I still feel like the best option would be to derive from span and add the comparators (or stick with the original implementation). The original class was in |
I agree. A centralized place for this is considerably cleaner. This is a new class with some overlap with existing types so there's weight to the cognitive load argument. But, it's constrained to cuIO so this isn't something that's going to be widely in use, and there's precedence for this kind of thing (eg. I also agree with the idea of potentially making this some kind of derived class off of device_span, but that's probably a followup PR. |
I'm good with merging as long as we open an issue to work through the redesign ideas. |
rerun tests |
@gpucibot merge |
|
When reviewing PR #11322 it was noted that it would be preferable to use `std::byte` for the data type, but at the time that didn't work out, so the plan was to address it later and issue #11362 was created to track it. Fixes #11362 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Tobias Ribizel (https://github.com/upsj) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #11424
Description
I wanted to get this up in a PR of its own so we could get some discussion going if necessary. This adds a
byte_array_view
which is almost identical to astring_view
. The goal is to be able to get these for list columns of bytes, solist<uint8>
andlist<int8>
. I didn't template it on the type, but instead selecteduint8_t
becausestd::byte
is auint8_t
. My PR for writing byte arrays in parquet will use this to get the rows of byte data for statistics and writing. That PR is forthcoming. I left this code down in cuio statistics due to the usage and the previous discussions regarding.element
. I needed to wrap thedevice_span
because I need comparison operators for the cub reduce.Checklist