Skip to content

Commit

Permalink
document pg_guard to the best of my abilities (#1729)
Browse files Browse the repository at this point in the history
Closes #1406

---------

Co-authored-by: Jubilee <[email protected]>
  • Loading branch information
eeeebbbbrrrr and workingjubilee authored Jun 3, 2024
1 parent 1d855a7 commit d8288d9
Show file tree
Hide file tree
Showing 2 changed files with 268 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- [Memory Contexts](./pg-internal/memory-context.md)
- [Varlena Types](./pg-internal/varlena.md)
- [`sigsetjmp` & `siglongjmp`](./pg-internal/setjmp-longjmp.md)
- [FFI Error Handling](./ffi-error-handling.md)
- [Contributing](./contributing.md)
- [PGRX Internals](./contributing/pgrx-internal.md)
- [Releases](./contributing/release.md)
Expand Down
267 changes: 267 additions & 0 deletions docs/src/ffi-error-handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
Postgres is written in C. pgrx is written in Rust. Between them is a boundary where each blindly believes the other
behaves in an expected way. Our primary concern with this boundary is error handling.

There are many other concerns across this boundary such as function call ABIs and pointer ownership, but these are generally
"obvious" concerns to anyone that's done any FFI development and won't be discussed here in detail.


# High-level Postgres Error Handling Overview

Most Postgres internal functions (those accessible via the `pg_sys` module) are capable of raising an `ERROR`. This "error"
comes into existence when, internally, Postgres code calls the `ereport()` (or `elog()`) macro. `ereport()` does the work
to instantiate the error by first calling the `errstart()` function.

Code execution then finds its way to `errfinish()` where, finally, `siglongjmp()` is called to instantly move the stack
back to the frame where Postgres began the current transaction (where it previously created a `sigsetjmp()` point).

From here the code detects that it's a second return from `sigsetjmp()` and performs the necessary actions to ROLLBACK
the current transaction. Finally, Postgres is again ready and waiting to begin a new transaction.

This is an elegant solution to error handling as it allows Postgres to cleanly rollback the current transaction, free
used memory, release locks, and whatever else might be necessary.


# High-level Rust Error Handling Overview

Rust, on the other hand, will call its panic hook and then run its panic handler when `panic!()` is called.
The panic handler itself will either *unwind* the stack or *abort* the current process.
This is, quite clearly, incompatible with Postgres' error handling approach:
unwinding destroys `sigsetjmp` checkpoints, and aborting shuts down the entire database!

It's technically incompatible, it's spiritually incompatible, and it robs Postgres of the opportunity to cleanly rollback
the current transaction (Postgres is supposed to be tolerant of such situations, but who wants to test Postgres'
recoverability in production?).

Conversely, Postgres' `sigsetjmp`/`siglongjmp` approach is as egregiously incompatible with Rust. `siglongjmp` will blindly
jump over Rust stack frames, leaking Rust-allocated memory, ignoring `trait Drop` implementations, and denying Rust code
any opportunity to participate in error handling.


# A Wolf, a goat, and some cabbage

pgrx uses two different approaches to protect these FFI boundaries. While both are implemented in Rust, one protects
Rust from Postgres `setlongjmp` ERRORs and the other protects Postgres from Rust `panic()!`s. To make things confusing
they're both called `#[pg_guard]`.

Essentially, pgrx needs to guard two styles of `extern "C"` functions. One style is the [`extern "C" {}` block][extern-blocks] that
declares a function lives "somewhere else" (in our case, the Postgres process in which the pgrx extension is loaded).
The other style is [`extern "C" fn foo() { ... }` functions][extern-fn] that are written in Rust and might be passed to Postgres (for
it to later call) via a standard function pointer.

## Guarding Postgres Internal Functions

pgrx uses the [`bindgen`] tool to generate "bindings" for exported Postgres symbols. Postgres' source header files (`*.h`)
are read, parsed, and transformed, as much as bindgen knows how, into Rust declarations. In the case of exported functions,
bindgen generates blocks similar to:

```rust
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many more internal Postgres function definitions here ...
}
```

Then, pgrx' `build.rs` process rewrites these functions into something similar to this:

```rust
#[pg_guard]
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many more internal Postgres function definitions here ...
}
```

This form of the `#[pg_guard]` macro then walks the `extern "C" {}` block items and writes new function declarations for
each. This expansion looks similar to:

```rust
pub extern "C" fn palloc(size: Size) -> *mut ::std::os::raw::c_void {
extern "C" {
pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
// ... many other function definitions here ...
}

unsafe {
crate::ffi::pg_guard_ffi_boundary(|| palloc(size))
}
}
```

Essentially, in this usage, `#[pg_guard]` generates standalone wrapper functions that delegate to pgrx' `pg_guard_ffi_boundary(|| ...)`
function. This function sets up pgrx' own `sigsetjmp` point, lies to Postgres' exception handling stack about where it's
going to jump to in case of an ERROR, calls the function via the closure argument, then restores Postgres' exception handling
stack.

"Lies to Postgres" is a little curious here, but Postgres doesn't expect that whenever it raises an ERROR it'll be jumping
into a Rust-created stack frame. In fact, it doesn't have *any* expectations about where it's jumping other than, eventually,
the raised ERROR will either rollback the current transaction, be re-thrown, or in some cases, simply ignored.

As it relates to this document, the specific workings of this process is more of an implementation detail, but the gist of
the process is that we set up our own `sigsetjmp` point so that we can trap, in Rust, any ERROR Postgres might raise while
calling the internal Postgres function. This then allows us to convert that error into a Rust panic and have it propagated
through the call stack so that Rust's stack properly unwinds and type destructors are called.

We make sure to limit the amount of `sigsetjmp`-protected code to only be the internal Postgres function being called.
Doing so ensures we're doing the minimal amount of work necessary to properly protect from a Postgres ERROR and not also
accidentally defeating Rust's stack unwinding.

While Rust doesn't guarantee that `drop()` will get called for any instantiated type, we do our best to encourage it.
Of course, using `std::mem::forget()` on an instantiated type will never have its drop implementation called.

Ultimately, at the top of the Rust callstack, the panic raised from a Postgres ERROR is then converted back into a normal
Postgres ERROR using its internal facility for raising errors.

Generally speaking, pgrx extension developers don't need to worry about this, as this is all machine-generated at compile-time.
They can, however, manually create `extern "C" {}` blocks with a `#[pg_guard]` annotation if they wish to write their own
wrappers for specific internal Postgres functions that aren't yet exposed by pgrx. The project, of course, would prefer
pull requests to expose such functions through header inclusion.

## Guarding User Functions

The other way `#[pg_guard]` is used is for Rust functions that are `extern "C"`. These would be functions in the Rust
shared library that Postgres calls. The intent here is that such functions guard against Rust panics, so that they may
be properly converted into Postgres ERRORs. It's the opposite direction of the above.

Examples of these types of functions are any that are annotated with `#[pg_extern]`, in which the macro properly expands
to the necessary code, and other functions where it's necessary to give Postgres a pointer to that function -- the various
planner/executor hooks is an example of this.

In this case, `#[pg_guard]` is used as follows:

```rust
#[pg_guard]
extern "C" fn foo() -> bool {
// ... user-written Rust code here ...
return true;
}
```

During compilation, the macro will expand to something similar to:

```rust
extern "C" fn foo() -> bool {
pgrx::pg_sys::submodules::panic::pgrx_extern_c_guard(move || {
// ... user-written Rust code here ...
return true;
})
}
```

Behind the scenes, `pgrx_extern_c_guard(|| ...)` executes the closure argument inside a rust `std::panic::catch_unwind(|| ...)`
block. Doing so allows pgrx to capture any Rust `panic!()` and contain its stack unwinding to within the [`catch_unwind`]
which allows for Rust destructors to be run, Rust to free memory, and for pgrx to ensure we don't end up aborting the
backend process.

When control is returned to `pgrx_extern_c_guard()`, the captured panic is converted into a Postgres ERROR and raised.
Ultimately, this will ROLLBACK the current database transaction. It will not abort the backend process.

Catching Rust panics and converting to Postgres ERRORs ensures that user code (in Rust) doesn't try to unwind the stack
back into Postgres' stack, which is managed by the C runtime. Failure to use `#[pg_guard]` on a Rust `extern "C" fn`
that `panic!()`s will absolutely cause a segfault.


## Getting Across the Bridge

The most common scenario where all this is wired together is in exposing Rust functions as SQL functions with `#[pg_extern]`.
Imagine you've created a function called `strlen()` that, given a `String` returns its length...

```rust
fn strlen(input: String) -> i64 {
input.len() as i64 // postgres doesn't support unsigned ints -- irrelevant implementation detail
}
```

... and you want this to be exposed as a SQL function. To do so you simply add the `#[pg_extern]` annotation:

```rust
#[pg_extern]
fn strlen(input: String) -> i64 {
input.len() as i64 // postgres doesn't support unsigned ints -- irrelevant implementation detail
}
```

Now, you've got a function you can use via sql:

```sql
[postgres] # SELECT strlen('hello, world');
```

At compile time, pgrx has rewritten this `strlen` function to look more like the below. It's not *exactly* this, but the
exact code is an implementation detail subject to change...

```rust
extern "C" fn strlen(input: String) -> i64 {
pgrx_extern_c_guard(|| input.len() as i64)
}
```

Lets say you have another function that, for some unknown reason, wants to open and then close a relation (table):

```rust
use std::time::Duration;

#[pg_extern]
fn rel_open_close(oid: pg_sys::Oid) {
struct Foo;
impl Drop for Foo {
fn drop(&mut self) {
eprintln!("Foo got dropped");
}
}

unsafe {
let _foo = Foo;
let rel = pg_sys::relation_open(oid, pg_sys::AccessShareLock);

std::thread::sleep(Duration::from_secs(10)); // this is just an example

pg_sys::relation_close(rel, pg_sys::AccessShareLock);

// `_foo` should drop() here
}
}
```

You can imagine that the above gets "expanded", at compile time, into something similar to:

```rust
use std::time::Duration;

extern "C" fn rel_open_close(oid: pg_sys::Oid) {
pgrx_extern_c_guard(|| {
struct Foo;
impl Drop for Foo {
fn drop(&mut self) {
eprintln!("Foo got dropped");
}
}

unsafe {
extern "C" {
fn relation_open(oid: pg_sys::Oid, lmode: pg_sys::LOCKMODE) -> *mut pg_sys::RelationData;
}
let rel = pg_guard_ffi_boundary(|| relation_open(oid, pg_sys::AccessShareLock));

std::thread::sleep(Duration::from_secs(10)); // this is just an example

extern "C" {
fn relation_close(rel: pg_sys::RelationData, lmode: pg_sys::LOCKMODE);
}
pg_guard_ffi_boundary(|| relation_close(rel, pg_sys::AccessShareLock));

// `_foo` should drop() here
}
})
}
```

Combined, we're ensuring that if any of the Postgres functions (`relation_open`/`relation_close`) raise an ERROR (say, due
to an invalid Oid value), `pg_guard_ffi_boundary` will catch that and convert into a Rust panic. Then ultimately, the top-level
`pgrx_extern_c_guard` call will convert it back into a Postgres ERROR once the Rust stack has properly unwound and drop
impls have been called.

[`mem::forget`]: https://doc.rust-lang.org/std/mem/fn.forget.html
[`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
[`bindgen`]: https://rust-lang.github.io/rust-bindgen/
[extern-blocks]: https://doc.rust-lang.org/reference/items/external-blocks.html
[extern-fn]: https://doc.rust-lang.org/reference/items/functions.html#extern-function-qualifier

0 comments on commit d8288d9

Please sign in to comment.