-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show examples of building Arrow C++ structures #187
Comments
That's a great idea! That mapping is definitely not obvious unless you've spent a lot of time with the C data interface. |
I was wondering if you we could bring this back to the fore? I am currently loosing my mind as a simple 'call from R' wrapper for the (nice) // Plain Interface
// [[Rcpp::export]]
bool linesplit_from_R_plain(const std::string str, SEXP sxparr) {
// We get an R-created 'nanoarrow_array', an S3 class around an external pointer
if (!Rf_inherits(sxparr, "nanoarrow_array"))
Rcpp::stop("Expected class 'nanoarrow_array' not found");
// It is a straight up external pointer so we can use R_ExternalPtrAddr()
struct ArrowArray* arr = (struct ArrowArray*)R_ExternalPtrAddr(sxparr);
auto res = linesplitter_read(str, arr);
return true;
} But when I want to start from C++, I get lost somewhere build the // res <- linesplit_from_cpp("the\nquick\nbrown\nfox");
// print(res);
// print(arrow::Array$create(res))
//
// [[Rcpp::export]]
Rcpp::XPtr<ArrowArray> linesplit_from_cpp(const std::string str) {
Rcpp::Environment ns = Rcpp::Environment::namespace_env("nanoarrow");
Rcpp::Function f1 = Rcpp::Function("nanoarrow_array_init", ns);
Rcpp::Function f2 = Rcpp::Function("na_string", ns);
auto sxparr = f1(f2());
// It is an external pointer we can access, here with checking
auto arr = xptr_get_ptr<ArrowArray>(sxparr, "nanoarrow_array");
auto res = linesplitter_read(str, arr);
auto s = Rcpp::XPtr<ArrowArray>{sxparr};
// setting a tag somehow upsets the Arrow nature of things
//xptr_set_tag(s, Rcpp::wrap(XPtrTagType<ArrowArray>));
return s;
} (Apologies for the small bits of template <typename T> Rcpp::XPtr<T> make_xptr(SEXP p) {
return Rcpp::XPtr<T>(p); // the default Rcpp::XPtr ctor with deleter on and tag and prot nil
} So far so good but I still have two problems. I can't seem to build an external pointer 'up from the C/C++ bases' to make By now there are nice bits and pieces of PS And the reason I put this here into the 'more C++ examples' issue was that I do enjoy the bits and pieces of debugging infra. I have some alternate uses of (nano)arrow but am thinking of rewriting parts of those to stay closer to |
Yes: the
For R-specific helpers, perhaps a header like the one you mentioned in For Python, we generate Cython definitions ( For C++ (i.e.,
This should definitely be a vignette/article! As you noted there is now ADBC and soon geoarrow (and whatever you are up to!) which are the first few test cases. Porting the linesplitter example would be a great place to start, as you noted (i.e., here's how you'd wrap the function in an R package...). |
Thanks for the quick reply! I am definitely up for brainstorming a little more and 'slowly but surely' expanding this. I'll also try to clean up my trivial little wrapper around (And while I don't do much Python colleagues do and having "moar Python" with |
Dang. Re-reading this sentence I think I once knew that too from reading your code but forgot. That explains that part. I can try to also put one there when I build up 'from the other direction'. |
Tried setting the schema external pointer as a tag, but no luck so far. I put a (truly minimal) example up here. It builds and checks cleanly for me, and has one example with either > example(linesplit, package="linesplitter")
lnsplt> txt <- "the\nquick\nbrown\nfox"
lnsplt> linesplit(txt)
<nanoarrow_array string[4]>
$ length : int 4
$ null_count: int 0
$ offset : int 0
$ buffers :List of 3
..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
..$ :<nanoarrow_buffer data_offset<int32>[5][20 b]> `0 3 8 13 16`
..$ :<nanoarrow_buffer data<string>[16 b]> `thequickbrownfox`
$ dictionary: NULL
$ children : list()
lnsplt> linesplit(txt, TRUE)
Array
<string>
[
"the",
"quick",
"brown",
"fox"
]
>
|
Cool! Is the example somewhere I can find it!? I would love to add it to
Hmm...I also usually do this in R ( |
The example I show above is in the manual page (and in the R code via the usual means). There are a few things I should clean up still -- I like have types other than (As an aside you have a very very large part of Otherwise happy to stick the example into the README / add a quick Ah, and in all the excitement I promptly forgot to point to the repo I made (quickly) for this: https://github.com/eddelbuettel/linesplitter |
I didn't know duckdb was using it! All |
In developing geoarrow, an R extension that imports/exports a number of Arrow C data structures wrapped by nanoarrow S3 objects, it has become apparent that the sanitize and allocate operations are non-trivial and basically have to be copied in every package that wants to import/export nanoarrow things. @eddelbuettel has run up against some valid use-cases as well! eddelbuettel/linesplitter#1 , #187 This PR makes the definition of how Arrow C Data interface objects are encoded as R external pointers available as a public header (such that downstream packages can `LinkingTo: nanoarrow` and `#include <nanoarrow/r.h>`. I think the initial target will just be allocate an owning external pointer and sanitize an input SEXP. @eddelbuettel Are there any other operations that are blocking any of your projects that would be must-haves in this header? (I know it's missing array_stream...I forgot about the ability to supply R finalizers and so it's a slightly more complicated change)
For users coming from Arrow C++ / PyArrow, it might not be obvious what nanoarrow structures to create so they can export. Depending on the context, an
ArrowSchema
can represent a data type, a field, or a schema. Similarly, anArrowArray
can represent an array or record batch (tabular). To start, we should show the correspondence between Nanoarrow structs and Arrow C++ types.ArrowArray
arrow::Array
ArrowArray
where type is structarrow::RecordBatch
std::vector<ArrowArray>
arrow::ChunkedArray
std::vector<ArrowArray>
where type is structarrow::Table
ArrowSchema
arrow::DataType
ArrowSchema
arrow::Field
ArrowSchema
arrow::Schema
Then we may also want recipes for:
arrow::RecordBatch
arrow::Table
.The text was updated successfully, but these errors were encountered: