Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document pg_guard to the best of my abilities #1729

Merged
merged 10 commits into from
Jun 3, 2024
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- [Memory Contexts](./pg-internal/memory-context.md)
- [Varlena Types](./pg-internal/varlena.md)
- [`sigsetjmp` & `siglongjmp`](./pg-internal/setjmp-longjmp.md)
- [FFI Error Handling](./ffi-error-handling.md)
- [Contributing](./contributing.md)
- [PGRX Internals](./contributing/pgrx-internal.md)
- [Releases](./contributing/release.md)
Expand Down
249 changes: 249 additions & 0 deletions docs/src/ffi-error-handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
Postgres is written in C. pgrx is written in Rust. Between them is a boundary where each blindly believes the other
behaves in an expected way. Our primary concern with this boundary is error handling.

There are many other concerns across this boundary such as function call ABIs and pointer ownership, but these are generally
"obvious" concerns to anyone that's done any FFI development and won't be discussed here in detail.


# High-level Postgres Error Handling Overview
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved

Most Postgres internal functions (those accessible via the `pg_sys` module) are capable of raising an `ERROR`. This "error"
comes into existence when, internally, Postgres calls `errstart()`, which does the work instantiate the error.
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved

Code execution then finds its way to `errfinish()` where, finally, `siglongjmp()` is called to instantly move the stack
back to the frame where Postgres began the current transaction (where it previously created a `sigsetjmp()` point).
Comment on lines +14 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This should probably clarify that it moves the state to that exact point-in-time of the sigsetjmp, "right after" it was called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you mean, but I'm not quite sure of the words you'd want to explain it. I'll let you add this point. ;)


From here the code detects that it's a second return from `sigsetjmp()` and performs the necessary actions to ROLLBACK
the current transaction. Finally, Postgres is again ready and waiting to begin a new transaction.

This is an elegant solution to error handling as it allows Postgres to cleanly rollback the current transaction, free
used memory, release locks, and whatever else might be necessary.


# High-level Rust Error Handling Overview

Rust, on the other hand, aborts the current process whenever `panic!()` is called. This is, quite clearly, incompatible
with Postgres' error handling approach.
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved

It's technically incompatible, it's spiritually incompatible, and it robs Postgres of the opportunity to cleanly rollback
the current transaction (Postgres is supposed to be tolerant of such situations, but who wants to test Postgres'
recoverability in production?).

Conversely, Postgres' `sigsetjmp/siglongjmp` approach is as egregiously incompatible with Rust. `siglongjmp` will blindly
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
jump over Rust stack frames, leaking Rust-allocated memory, ignoring `trait Drop` implementations, and denying Rust code
any opportunity to participate in error handling.


# A Wolf, a goat, and some cabbage

pgrx uses two different approaches to protect these FFI boundaries. While both are implemented in Rust, one protects
Rust from Postgres `setlongjmp` ERRORs and the other protects Postgres from Rust `panic()!`s. To make things confusing
they're both called `#[pg_guard]`.

Essentially, pgrx needs to guard two styles of `extern "C"` functions. One style is the `extern "C" {}` block that
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
declares a function lives "somewhere else" (in our case, the Postgres process in which the pgrx extension is loaded).
The other style is `extern "C" fn foo() { ... }` functions that are written in Rust and might be passed to Postgres (for
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
it to later call) via a standard function pointer.

## Guarding Postgres Internal Functions

pgrx uses the `bindgen` tool to generate "bindings" for exported Postgres symbols. Postgres' source header files (`*.h`)
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
are read, parsed, and transformed, as much as bindgen knows how, into Rust declarations. In the case of exported functions,
bindgen generates blocks similar to:

```rust
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many more internal Postgres function definitions here ...
}
```

Then, pgrx' `build.rs` process rewrites these functions into something similar to this:

```rust
#[pg_guard]
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many more internal Postgres function definitions here ...
}
```

This form of the `#[pg_guard]` macro then walks the `extern "C" {}` block items and writes new function declarations for
each. This expansion looks similar to:

```rust
pub extern "C" fn palloc(size: Size) -> *mut ::std::os::raw::c_void {
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many other function definitions here ...
}

unsafe {
crate::ffi::pg_guard_ffi_boundary(|| palloc(size))
}
}
```

Essentially, in this usage, `#[pg_guard]` generates standalone wrapper functions that delegate to pgrx' `pg_guard_ffi_boundary(|| ...)`
function. This function sets up pgrx' own `sigsetjmp` point, lies to Postgres' exception handling stack about where it's
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
going to jump to in case of an ERROR, calls the function via the closure argument, then restores Postgres' exception handling
stack.

As it relates to this document, the specific workings of this process is more of an implementation detail, but the gist of
the process is that we set up our own `sigsetjmp` point so that we can trap, in Rust, any ERROR Postgres might raise while
calling the internal Postgres function. This then allows us to convert that error into a Rust panic and have it propagated
through the call stack so that Rust's stack properly unwinds and type destructors are called.

While Rust doesn't guarantee that `drop()` will get called for any instantiated type, we do our best to encourage it.
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved

Ultimately, at the top of the Rust callstack, the panic raised from a Postgres ERROR is then converted back into a normal
Postgres ERROR using its internal facility for raising errors.

Generally speaking, pgrx extension developers don't need to worry about this, as this is all machine-generated at compile-time.
They can, however, manually create `extern "C" {}` blocks with a `#[pg_guard]` annotation if they wish to write their own
wrappers for specific internal Postgres functions that aren't yet exposed by pgrx. The project, of course, would prefer
pull requests to expose such functions through header inclusion.

## Guarding User Functions

The other way `#[pg_guard]` is used is for Rust functions that are `extern "C"`. These would be functions in the Rust
shared library that Postgres calls. The intent here is that such functions guard against Rust panics, so that they may
be properly converted into Postgres ERRORs. It's the opposite direction of the above.

Examples of these types of functions are any that are annotated with `#[pg_extern]`, in which the macro properly expands
to the necessary code, and other functions where it's necessary to give Postgres a pointer to that function -- the various
planner/executor hooks is an example of this.

In this case, `#[pg_guard]` is used as follows:

```rust
#[pg_guard]
extern "C" fn foo() -> bool {
// ... user-written Rust code here ...
return true;
}
```

During compilation, the macro will expand to something similar to:

```rust
extern "C" fn foo() -> bool {
pgrx::pg_sys::submodules::panic::pgrx_extern_c_guard(move || {
// ... user-written Rust code here ...
return true;
})
}
```

Behind the scenes, `pgrx_extern_c_guard(|| ...)` executes the closure argument inside a rust `std::panic::catch_unwind(|| ...)`
block. Doing so allows pgrx to capture any Rust `panic!()` and contain it its stack unwinding to the `catch_unwind()` block
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
which allows for Rust destructors to be run, Rust to free memory, and for pgrx to ensure we don't end up aborting the
backend process.

When control is returned to `pgrx_extern_c_guard()`, the captured panic is converted into a Postgres ERROR and raised.
Ultimately, this will ROLLBACK the current database transaction. It will not abort the backend process.

Catching Rust panics and converting to Postgres ERRORs ensures that user code (in Rust) doesn't try to unwind the stack
back into Postgres' stack, which is managed by the C runtime. Failure to use `#[pg_guard]` on a Rust `extern "C" fn`
that `panic!()`s will absolutely cause a segfault.


## Getting Across the Bridge

The most common scenario where all this is wired together is in exposing Rust functions as SQL functions with `#[pg_extern]`.
Imagine you've created a function called `strlen()` that, given a `String` returns its length...

```rust
fn strlen(input: String) -> i64 {
input.len() as i64 // postgres doesn't support unsigned ints -- irrelevant implementation detail
}
```

... and you want this to be exposed as a SQL function. To do so you simply add the `#[pg_extern]` annotation:

```rust
#[pg_extern]
fn strlen(input: String) -> i64 {
input.len() as i64 // postgres doesn't support unsigned ints -- irrelevant implementation detail
}
```

Now, you've got a function you can use via sql:

```sql
[postgres] # SELECT strlen('hello, world');
```

At compile time, pgrx has rewritten this `strlen` function to look more like the below. It's not *exactly* this, but the
exact code is an implementation detail subject to change...

```rust
extern "C" fn strlen(input: String) -> i64 {
pgrx_extern_c_guard(|| input.len() as i64)
}
```

Lets say you have another function that, for some unknown reason, wants to open and then close a relation (table):

```rust
use std::time::Duration;

#[pg_extern]
fn rel_open_close(oid: pg_sys::Oid) {
struct Foo;
impl Drop for Foo {
fn drop(&mut self) {
eprintln!("Foo got dropped");
}
}

unsafe {
let _foo = Foo;
let rel = pg_sys::relation_open(oid, pg_sys::AccessShareLock);

std::thread::sleep(Duration::from_secs(10)); // this is just an example

pg_sys::relation_close(rel, pg_sys::AccessShareLock);

// `_foo` should drop() here
}
}
```

You can imagine that the above gets "expanded", at compile time, into something similar to:

```rust
use std::time::Duration;

extern "C" fn rel_open_close(oid: pg_sys::Oid) {
pgrx_extern_c_guard(|| {
struct Foo;
impl Drop for Foo {
fn drop(&mut self) {
eprintln!("Foo got dropped");
}
}

unsafe {
extern "C" {
fn relation_open(oid: pg_sys::Oid, lmode: pg_sys::LOCKMODE) -> *mut pg_sys::RelationData;
}
let rel = pg_guard_ffi_boundary(|| relation_open(oid, pg_sys::AccessShareLock));

std::thread::sleep(Duration::from_secs(10)); // this is just an example

extern "C" {
fn relation_close(rel: pg_sys::RelationData, lmode: pg_sys::LOCKMODE);
}
pg_guard_ffi_boundary(|| relation_close(rel, pg_sys::AccessShareLock));

// `_foo` should drop() here
}
})
}
```

Combined, we're ensuring that if any of the Postgres functions (`relation_open`/`relation_close`) raise an ERROR (say, due
to an invalid Oid value), `pg_guard_ffi_boundary` will catch that and convert into a Rust panic. Then ultimately, the top-level
`pgrx_extern_c_guard` call will convert it back into a Postgres ERROR once the Rust stack has properly unwound and drop
impls have been called.
eeeebbbbrrrr marked this conversation as resolved.
Show resolved Hide resolved
Loading