Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify what a handler does and it's components #172

Open
JosiahParry opened this issue Feb 17, 2023 · 6 comments
Open

Clarify what a handler does and it's components #172

JosiahParry opened this issue Feb 17, 2023 · 6 comments

Comments

@JosiahParry
Copy link

I'm trying to build up compatibility of rust geo-types to wk, geos, and sf. I suspect that a wk handler would be helpful here but I do not completely know how. I've read through https://paleolimbot.github.io/wk/articles/programming.html. However after doing so I'm still unsure what the purpose of a handler is or what is needed to have an effective handler (except the lifetime that is clear).

Could the vignette be update to include a definition of the handler and a description of it's purpose?

@paleolimbot
Copy link
Owner

You're right! A handler would allow out-of-the-box conversion support for pretty much every geo type in R. That said, handlers are complicated to write and writing them in C is error-prone and hard to get right. The Rust equivalent of this is called "geozero", which might be more helpful to read up on than anything I've provided here (or will provide in the future).

An alternative approach would be to resolve geometries into memory chunk-wise and operate on them that way. With the caveat that geoarrow isn't ready for general consumption yet, if I wrote wk again, I would write all the operators using the pattern of (1) chunk the input into one or more GeoArrow arrays (i.e., as_geoarrow(some_input[1:5]) and then (2) hand over the chunk to the consumer (in this case, your rust geo R package) to iterate over at their leisure.

Maybe a good analogy is like...in the "reader/handler" thing that wk currently does, the "reader" is reading you (the handler) a book...you can write down information that you think is important as it's provided to you but you still have to listen to the whole thing at whatever pace the reader wants to read. In the "chunk and operate" approach, the reader hands you the book and waits until you're done with it.

@JosiahParry
Copy link
Author

That helps a smidgen! geozero is a hard no because of the massive amount of dependencies that are baked into it (gdal, geos, arrow, etc).

I get that a handler conceptually takes an object and converts them to an fro types but little more than that. What I'm somewhat lost on is what the requirements are of a handler. What must each handler do? Your examples returns a bounding box, but I suspect there is more to it than that.

Is wk intended to be extensible? If not I suspect I can leave it alone and write custom methods for conversion to each type e.g. sfg geo_geometry and hopefully geoarrow

@JosiahParry
Copy link
Author

I wanted to follow up with this issue as i've been thinking about it again. I presume that a handler can be written in any language so long as it returns the appropriate memory format, is that right?

Is there any source code or comments that provide hints as to what is required to write a new wk handler?

@paleolimbot
Copy link
Owner

Other than the new programming vignette, nothing explicit exists...there are pretty much just examples. The most straightforward example (I think) is the WKT writer because the order in which various bits of well-known text are written is very close (on purpose) to the order in which the handler methods are called by a reader: https://github.com/paleolimbot/wk/blob/master/src/wkt-writer.cpp#L85-L227 . It also gives you an idea of the kind of state you have to keep track of in the "handler" style of doing things...you're getting thrown various bits of information and you occasionally have to save some of it to do the right thing when you're thrown another bit of information.

One of the reasons that nothing explicit exists is that I'm planning to start using a new version of it based on some of the lessons I've learned from writing readers and handlers here. That new version is here: https://github.com/geoarrow/geoarrow-c/blob/main/src/geoarrow/geoarrow_type.h#L286-L303 but has the disadvantage that I haven't quite wired it up in R yet so you can't write one and try it right away.

Writing an existing handler in C++ (maybe even copy/paste the WKT writer example and modify) is probably the best place to start. While there's no technical limitation on translating the header that defines the handler struct into Rust (maybe bindgen can do it for you?), it sounds hard. The "new" version is theoretically more friendly for that because it doesn't use any R types and is maybe easier to do.

@JosiahParry
Copy link
Author

Thanks, Dewey! We've now got Rust bindings via bindgen and I've been able to successfully create a void handler. Looking at the other implementations I'm still somewhat lost as to what is required. I'm unclear what required inputs are needed for a handler and what the required intermediary structures are (if any) and same with output.

Based on some implementations such as xy, wkb, and wkt, the outputs are all different—two same length vectors, a list of binary, and a character vector respectively. So, that doesn't seem to matter. From what I can tell there are different approaches for scalar geometries and vectors of them.

Does each geometry need to have a wk_meta_t and vectors wk_vector_meta_t? It also seems to me that the data for the handler is stored in wk_handler_t's handler_data. Is that correct? Is handler_data a vector of length 4 doubles (x, y, z, m)?

I'm finding the geos handler a bit more helpful to see whats going on.

Feel free to close this issue if it's too much! Thanks :)

@paleolimbot
Copy link
Owner

I'm happy to help!

The handler_data is a C-language way to allow implementation-specific data. In Python this would be self...in C++ it would be this...the idea is the same: a "subclass" (i.e., handler implementation) is doing its own thing that the caller doesn't know about, so it needs its own data. The caller doesn't know anything about what that data contains but does have to pass it on to the handler functions. The pattern used by the "new" handler system I linked to earlier is a little more common/sane but it's the same idea. Your implementation will have to allocate the handler_data on the heap (maybe via Box<>? The C++ equivalent would be new...I don't really know how pointers work in Rust). You'll need to make sure to call drop somehow (C++ would be delete) in the release implementation.

For you this can and almost certainly should be a pointer to a Rust object. In each function implementation, the first line should cast that pointer to a Rust type so that you can write the rest of your callback in normal Rust.

With respect to the return value...vector_end will get called at some point and has to return an SEXP. It's up to you what that is and when you construct it...it's probably easiest to have something like Vec<Geometry> in your handler_data and convert it to whatever your R equivalent of that is in vector_end.

Does each geometry need to have a wk_meta_t and vectors wk_vector_meta_t?

If you're writing a reader (which is what you linked to for GEOS), yes. You can stack-allocate it, populate its fields, and pass a pointer to it to the handler callback. Writing a reader is a lot easier than writing a handler and I suggest trying it first. If you writing a handler you don't have to worry about how they were allocated. You get a pointer to them and you can inspect their values, saving any values you care about in some other structure (probably a Rust one).

Is handler_data a vector of length 4 doubles (x, y, z, m)?

Definitely not! For you this should be a pointer to a heap-allocated Rust object.

I hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants