-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(r): Provide LinkingTo headers for extension packages #332
Conversation
We are making use of Arrow exports from C++ using just the I quite like Now this PR looks promising/ I will look more closely at it and then release 0.4.0. This clearly seems to move in the right direction so three cheers for that! If you want to brainstorm / chat for a few minutes I am sure we can find some time around the two hours of timezone difference. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #332 +/- ##
==========================================
- Coverage 88.08% 88.08% -0.01%
==========================================
Files 72 68 -4
Lines 11677 11583 -94
==========================================
- Hits 10286 10203 -83
+ Misses 1391 1380 -11 ☔ View full report in Codecov by Sentry. |
FYI I just checked in on this. Rebases nearly perfectly (one edit of a comment) and the |
Thanks for checking in! This will be included in nanoarrow 0.4 (release in early January) but I want to add documentation to the header before merging since it would effectively freeze the R representation of these three structures. I am going to mark it as "experimental": while I don't envision it changing, I do want to leave some time for others to try and comment. |
No problem. My curiosity was triggered when I saw the ❌ there but it is really just |
fba765f
to
5886751
Compare
It needs a one-liner from clang-format. |
@eddelbuettel Can you take a look at this to ensure that what I've described matches your expectations? |
/// The "tag" of an external pointer to an ArrowArray must be R_NilValue or an external | ||
/// pointer to an ArrowSchema that may be used to interpret the pointed-to ArrowArray. The | ||
/// "tag" of a nanoarrow external pointer to an ArrowSchema or ArrowArrayStream is | ||
/// reserved for future use and must be R_NilValue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the above is quite good and helpful, and I wish I would have had that earlier.
Some of it is still a little vague (ie if this is different and distinguishable from Arrow, why is that so and where are some example use cases -- my use of nanoarrow has been to 'be like Arrow' without requiring and I do not seem alone in this).
Let me chew over it some more -- but it looks like a nice step forward. So thanks for that!
The (arguably very minimal) test of rewriting linesplitter to use |
Switched a few more local copied / variants to what I do have one test function though where I currently do auto sxpsch = nanoarrow_schema_owning_xptr(); // allocates and wraps as XP
auto sch = nanoarrow_schema_from_xptr(sxpsch);
ArrowSchemaInitFromType(sch, NANOARROW_TYPE_STRING); that ends in error with |
That's a good point...the |
Hah! These subtle 'documentation bugs' are a wee bit tricky and one of the reasons I <cough, cough> blegged you to chip in with a reference example for something as simple as the linesplitter repo. I still have high hopes for all this as it moves in the right direction. One goal is to be able to start from a pair of |
Python is scheduled for this quarter and should have very similar tools to what's here! Notably, the initial version will provide the ability to implement the capsule interface, which is basically what this PR documents somewhat less officially for R ( https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html ). I agree that the intended pattern is starting with one or more |
In developing geoarrow, an R extension that imports/exports a number of Arrow C data structures wrapped by nanoarrow S3 objects, it has become apparent that the sanitize and allocate operations are non-trivial and basically have to be copied in every package that wants to import/export nanoarrow things. @eddelbuettel has run up against some valid use-cases as well! eddelbuettel/linesplitter#1 , #187
This PR makes the definition of how Arrow C Data interface objects are encoded as R external pointers available as a public header (such that downstream packages can
LinkingTo: nanoarrow
and#include <nanoarrow/r.h>
. I think the initial target will just be allocate an owning external pointer and sanitize an input SEXP.@eddelbuettel Are there any other operations that are blocking any of your projects that would be must-haves in this header?
(I know it's missing array_stream...I forgot about the ability to supply R finalizers and so it's a slightly more complicated change)