-
-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Datum as a newtyped pointer to an opaque type #545
Conversation
Also localizes the sptr polyfill to pg_sys
|
||
#[repr(C)] | ||
#[derive(Debug, Copy, Clone)] | ||
pub struct NullableDatum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see this used anywhere. I can intuit from the name/definition what it is, but what's your plan for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of the types that Postgres exposes through the bindgen. I wanted to bring both directly "into" our control, because it's important that NullableDatum actually uses the Datum type we define, otherwise it's too hard to use from user code, and I didn't want to leave it to chance such that it might use the wrong type as a field. If there are genuinely no other use sites in PGX and we expect that no one externally relies on this type, however, it would be fine to not define it on the condition we also block it from being exported by the bindgen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things...
NullableDatum
was introduced in Postgres 12. It may not matter that we're now defining it for pg10/11 (as it wouldn't be used), but maybe we shouldn't pollute the namespace for them (also, we'll be dropping pg10 support in November of this year)- There are some usages of it in
fcinfo.rs
. Do they need to be double-checked, or did you and found them to be okay?- EDIT: when I said above I didn't see any uses, I just meant in this PR.
- Some doccomments on the type definition itself would be cool
And I'm sure you know, but Postgres' definition for it is below. I don't see any discrepancies tho.
/*
* A NullableDatum is used in places where both a Datum and its nullness needs
* to be stored. This can be more efficient than storing datums and nullness
* in separate arrays, due to better spatial locality, even if more space may
* be wasted due to padding.
*/
typedef struct NullableDatum
{
#define FIELDNO_NULLABLE_DATUM_DATUM 0
Datum value;
#define FIELDNO_NULLABLE_DATUM_ISNULL 1
bool isnull;
/* due to alignment padding this could be used for flags for free */
} NullableDatum;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can make it feature dependent.
I was thinking it would eventually also be an obvious choice to do conversion impls for Datum and NullableDatum and it's "more obvious" if it's right there, so in a way it's there for "please remember to care about me some day, eventually this will matter".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be neat if our "nullable datum" could be Option<Datum>
, but that ain't quite compatible, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust's ADT enums aren't compatible over FFI except in very specific "blessed" cases, but we could probably implement enough conversions and maybe with enough pixie dust it would feel almost transparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres' NullableDatum
is really only used in the function call stuff. Maybe it's also used elsewhere, but that's the only place we use it. Since all that is abstracted away from the programmer, it doesn't matter.
I was just sayin'. C sux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then probably we can manage to banish it from external use behind a layer of shims and such. I'll look into that as a followup.
.blocklist_type("Datum") // manually wrapping datum types for correctness | ||
.blocklist_type("NullableDatum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this was gonna be necessary. Cool.
@workingjubilee, this is really nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. :)
In #545 we replaced `Datum`'s autogenerated bindings with an opaque pointer. Because bindgen was not generating the type, it could not know which traits `Datum` implements. This changed how containers of `Datum` were emitted. In particular containers of `Datum` no longer derived `Debug`, `Copy`, or `Clone`. This also affected how unions containing these containers were emitted: bindgen only uses Rust unions if all members are `Copy`, falling back to a struct of `__BindgenUnionField<_>` fields otherwise. This change tells bindgen which traits our `Datum` implementation implements, fixing all of the above issues.
Datum is a de facto "union of all types", but in Postgres's C code is a de jure
uintptr_t
, on the assumption that integers can hold any bit pattern and ad hoc conversions are fine. However, Rust disagrees with that when it comes to pointers: borrows mean pointers can have invisible qualities like "is aliased or not" or "is mutable or not", and also "where it came from". Collectively these qualities are known as "provenance", and Rust asserts integers do not have provenance. It's unclear if they do or do not in C, though either answer can be... problematic for C code. So, in Rust,*mut T
is preferred for such an "ad hoc union" type, as creating an integer from a pointer preserves more of the desired information than creating a pointer from an integer.Thus, this makes Datum into a thin
#[repr(transparent)]
wrapper around a pointer to an "opaque type", as it is preferable for the pointer to be known "as a pointer" for every step of the way through Rust (it is fine if Postgres does anything funny: the FFI is black-boxed from Rust). In spite of being nominally a type mismatch, thanks to thetransparent
annotation and the way ABIs are defined it actually works on almost every machine, including some cutting-edge ones which aren't in popular usage yet and have "interesting" rules about pointers, but do run Postgres!It is hypothetically possible this may break PGX compilation on a "Motorola 68000" CPU. It's not clear to me Postgres or Rust work on those, however, as the only supported target for Rust is the tier 3
m68k-unknown-linux-gnu
.This does break simple
as
casts but allows picking up some more type safety along the way in general and implementing convenience methods and traits likeFrom
orPartialEq
.I haven't quite gone as far as dismantling
FromDatum
orIntoDatum
yet because those need to be more progressively eliminated and any changes necessary in this PR need to be shaken out first, but this will eliminate a lot of overhead as currentlyIntoDatum::into_datum
usesOption<Datum>
and this can generally just be a direct conversion.