Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhaul our Postgres Range support #1050

Merged
merged 8 commits into from
Feb 22, 2023
Merged

overhaul our Postgres Range support #1050

merged 8 commits into from
Feb 22, 2023

Conversation

eeeebbbbrrrr
Copy link
Contributor

@eeeebbbbrrrr eeeebbbbrrrr commented Feb 19, 2023

  • We now go ahead and deconstruct the range value in the FromDatum implementation. Immediately, this fixes issues with pg_sys::RangeBound.val datums potentially being used after free.

  • RangeData is gone and replaced with a more idiomatic RangeBound enum, to which the internal Postgres pg_sys::RangeBound struct can be mapped.

  • We provide From<std::ops::Range*> impls for all of Rust's built-in Range types.

  • Our pgx::Range can be constructed from any type that can be converted into a RangeBound and we provide direct impls for the set of supported RangeSubType types along with Option<T: RangeSubType> which maps Some to ::Inclusive and None to ::Infinite.

  • The idea that a Postgres range can be "empty" is properly relayed through Range's public API

  • add an example (albeit puny)

@eeeebbbbrrrr
Copy link
Contributor Author

Apologizes to @mhov... this isn't very faithful to your original impl from #663, but it has greatly improved ergonomics from the user-facing API side along with better interoperability with Rust's std::ops::Range* types.

I'd appreciate it if you'd look it over and let me know what you think. My intention is to merge this in the next day or two.

@mhov
Copy link
Contributor

mhov commented Feb 19, 2023

I wasn't attached to my implementation, no worries. As a serious RangeType end user, I'll try it out and review soon. Thanks @eeeebbbbrrrr

@mhov
Copy link
Contributor

mhov commented Feb 20, 2023

Ok, 100% more ergonomic and doesn't look like it was someone first year doing Rust. My main concern with my implementation was Range->Range was an implicit From<,> and so it wasn't obvious how you get the data without reading the code, so it's way easier now.
My concern with this implementation is having non-trivial conversion work in the FromDatum/IntoDatum

@workingjubilee might be able to elaborate better than me, but in the 4.5.0->0.5.0 period there was an effort to move non-trivial conversion logic out of the Into/FromDatum to the user code so it wouldn't cost anything until you actually needed it.
ex:
#504
#643
#622

Corner-case as it may be if we had a pg_extern with an input param of array of 10,000 dates/timestamp/daterange/intrange/etc there would 10,000 conversion overheads before line 1 of your actual fn logic. A few of us were hitting these bumps in our extension migration.

With *RangeType there's the deserialization/serialization step PLUS the TypeCache lookup for each datum into/from. In C-land if i had an array of 10,000 daterange (which i do quite often in my work) I'd get the typecache once and re-use it for 10k calls to range_deserialize and/or make_range. I had meant to add that as an option to pass to my original Range<T>/RangeData<T> methods.

I can think of a bunch of other contrived cases which may, understandably, lose out to ergonomics. Please tell me if I'm over-optimizing. As a C extension developer of a very performance-sensitive project, I'm in the zero-copy pgx gang. I'm afraid of losing any time to pgx wrappers/overhead. I understand I'm a different user from those who have never done C and their first pg extension is in pgx (which should be a target audience).

The great part about the SqlTranslatable change a while back, is I can easily add my ugly/optimized version of Range to my extension project without needing pgx to include it. So if none of the overhead is a concern to you, I have easy options to work around it.

Side question, would we ever want a contrib section of pgx for unfriendly, but possibly faster, Type implementations? :)

@eeeebbbbrrrr
Copy link
Contributor Author

Corner-case as it may be if we had a pg_extern with an input param of array of 10,000 dates/timestamp/daterange/intrange/etc there would 10,000 conversion overheads before line 1 of your actual fn logic. A few of us were hitting these bumps in our extension migration.

Are you using pgx::Array<T> which is lazily evaluated, or Vec<T> which is not?

Yes, this PR does the FromDatum conversion on both bounds when the Range itself is deconstructed, but these conversions only now only happen once per Range, whereas the current implementation would do that conversion every time the user asked for one of the bound values.

There was also a problem where the Datums behind the lower/upper bounds could have been pfree'd out from underneath the user without them knowing (that's a problem with all "zero copy" type impls, especially the FromDatum impl for &str), leading to UAF. And this could be a problem for any pass-by-reference range type -- specifically numrange.

As it is now, for pass-by-value ranges such as int4/8range and date/timerange types, I'd expect the compiler to basically optimize all the machinery away such that the bounds become the backing Datum value, as-is. However, numrange requires that we get the Datum's detoasted as soon as possible.


Circling back to large arrays of values... the point is well received, and I think it underscores that we need to do more work on making array support fully zero-copy and lazily evaluated where we can. Although, pgx::Array<T> is basically there, Rather than optimizing individual types that then rely on postgres-allocated memory to which we don't have a lifetime attached.

@eeeebbbbrrrr
Copy link
Contributor Author

Thinking about the TypeCache thing...

I'm not sure how we can mitigate that right now. It would require some thought and changes to pgx::Array in order to reduce it to one call per Array<Range<T>> (or even Vec<Range<T>>). But it's probably worth investigating how we could push it down.

- We now go ahead and deconstruct the range value in the `FromDatum` implementation.  Immediately, this fixes issues with `pg_sys::RangeBound.val` datums potentially being able to be used after free.

- `RangeData` is gone and replaced with a more idiomatic `RangeBound` enum, to which the internal Postgres `pg_sys::RangeBound` struct can be mapped.

- We provide `From<std::ops::Range*>` impls for all of Rust's built-in Range types.

- Our `pgx::Range` can be constructed from any type that can be converted into a `RangeBound` and we provide direct impls for the set of supported `RangeSubType` types along with `Option<T: RangeSubType>` which maps `Some` to `::Inclusive` and `None` to `::Infinite`.

- The idea that a Postgres range can be "empty" is properly relayed through `Range`'s public API

- add an example (albeit puny)
@eeeebbbbrrrr
Copy link
Contributor Author

I had to recreate this branch with cherry-picks and do a force push because I have no idea.

@workingjubilee
Copy link
Member

git be like that sometimes.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me a bit to get through things on top of the squirreling on other topics. I probably need to glance over it again still.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
pgx/src/datum/range.rs Show resolved Hide resolved
pgx/src/datum/range.rs Outdated Show resolved Hide resolved
pgx/src/datum/range.rs Show resolved Hide resolved
pgx/src/datum/range.rs Outdated Show resolved Hide resolved
- add a few help methods to `RangeBound`
- `impl Display for Range<T> where T: Display`
@eeeebbbbrrrr
Copy link
Contributor Author

@workingjubilee I've addressed all your concerns and for fun, added a few RangeBound::is_XXX() functions, along with implementing Display for Range.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Yes, this looks good now.

@eeeebbbbrrrr eeeebbbbrrrr merged commit 98c6e7d into develop Feb 22, 2023
@eeeebbbbrrrr eeeebbbbrrrr deleted the range-support-2.0 branch June 20, 2023 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants