-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36099: [C++] Add Utf8View and BinaryView to the c ABI #38443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: based on #37792
LGTM. But of course we need to wait for the separate review and merge of the string view PR before merging this. |
You also need to remove the corresponding skips in datagen.py. |
The largest issue I had forgotten: C-ABI doesn't store buffer sizes explicitly. (For string data we derive the data buffer's length by reading the offsets buffer.) I guess we'll need an extra buffer to store those, since reconstructing the sizes by examining every view is untenable |
0afb739
to
2d13898
Compare
@@ -3986,8 +4072,6 @@ TEST_F(TestDeviceArrayRoundtrip, Primitive) { | |||
TestWithJSON(mm, int32(), "[4, 5, null]"); | |||
} | |||
|
|||
// TODO C -> C++ -> C roundtripping tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed now since the C abi integration tests should cover this
ping @pitrou |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nitpicks / comments
dict_exporter_.reset(new SchemaExporter()); | ||
dict_exporter_ = std::make_unique<SchemaExporter>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've actually wondered.... is there any particular difference (performance or otherwise) doing .reset(new T)
vs = std::make_unique<T>()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scenarios where you have an std::unique_ptr<ParentClass>
being populated by many sub-classes of ParentClass
:
std::make_unique<T, Deleter>(...)
can lead to more binary bloat because class Deleter = std::default_delete<T>
is type-specialized whereas reset()
takes a super-class pointer and only needs the deleter that invokes the destructor via dynamic dispatching through the parent class vtable.
Being type-specialized often leads to the derived-classes destructors' being inlined into the std::default_deleter<T>
. All this extra inlining and removal of one indirection could lead to less overhead of destructor dispatching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, wouldn't it be better to keep this as calling .reset
rather than switch to make_unique
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case there is no difference since the assigned pointer is a unique_ptr<SchemaExporter>
and not a pointer to a base/parent class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I see no array roundtrip test, am I missing something?
- Can you rebase to get the latest changes (including list view integration)?
0b73842
to
df068c2
Compare
@pitrou rebased, added round trip tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think there are lint errors to fix
Yay! 🚀 |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 94fc124. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Hi @bkietz, is there a plan to add Utf8View support to the rust arrow2 library soon? If someone is not already working on it, I would like to take up the PR to add support for Utf8View to the arrow2 library (jorgecarleitao/arrow2#1596) |
@urvishdesai thanks for offering! I don't think anyone else has that queued, please do (and I'd be happy to review) |
…e#38443) ### Rationale for this change Utf8View and BinaryView should be added to the c ABI spec and to the c++ library's importer/exporter. ### Are these changes tested? Yes, minimally ### Are there any user-facing changes? View arrays will be importable/exportable through the c ABI in c++ * Closes: apache#36099 Authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Rationale for this change
Utf8View and BinaryView should be added to the c ABI spec and to the c++ library's importer/exporter.
Are these changes tested?
Yes, minimally
Are there any user-facing changes?
View arrays will be importable/exportable through the c ABI in c++