Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can this be used for FDW? #7

Closed
bbigras opened this issue Jul 14, 2020 · 9 comments
Closed

Can this be used for FDW? #7

bbigras opened this issue Jul 14, 2020 · 9 comments

Comments

@bbigras
Copy link
Contributor

bbigras commented Jul 14, 2020

No description provided.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Jul 14, 2020

You absolutely could implement a FDW with this.

I think we’ll need to get those Postgres headers included in the pgx-pg-sys crate, tho.

I’ll take a look at doing that this week.

@eeeebbbbrrrr
Copy link
Contributor

Okay, this will be released as part of v0.0.7 -- which is coming in an hour or so!

@eeeebbbbrrrr
Copy link
Contributor

released to crates.io

@bbigras
Copy link
Contributor Author

bbigras commented Jul 19, 2020

Awesome ❤️

Any example that could be used with CREATE EXTENSION <something>; ... CREATE SERVER <something> FOREIGN DATA WRAPPER <something>?

@eeeebbbbrrrr
Copy link
Contributor

Oh goodness. Not yet! ;)

Right now, that's left as an exercise for the programmer. The FDW api is fairly complex, and ultimately I'd want a safe Rust wrapper around it.

The API is only exposed as a low-level, unsafe Rust interface in the pgx::pg_sys module. You'd have to start here and work your way through implementing everything.

I have written a few FDW's (in C) in years past, so I can provide some bit of guidance if you go down this road. And I'd most definitely accept a PR that exposes a safe FDW interface.

@bbigras
Copy link
Contributor Author

bbigras commented Jul 21, 2020

Gotcha. Thanks! 👍

Sasasu added a commit to Sasasu/pgrx that referenced this issue Aug 17, 2023
this is because pgrx will overwrite the .so file in place.

dlopen will use mmap with MAP_PRIVATE. but if other process is modifing
the file.

dlopen will use MAP_PRIVATE to open the file and map a read-only memory
use mmap(2). usually this memory has copy-on-write. if others is modifing
the file. the previous mapped memory should not change.

but there is a undefined behavior, from man mmap(2):

```
MAP_PRIVATE: It is unspecified whether changes made to the file after the mmap()
call are visible in the mapped region.
```

what actually happens is that the read-only memory in postgresql is modified,
all pointers in the .TEXT segment is mashed up. the call stack looks like

```
 #0  0x0000000000731510 in ?? ()
 pgcentralfoundation#1  0x00007f7a94515132 in core::sync::atomic::AtomicUsize::store (self=0x7f7a95110318 <pgrx_pg_sys::submodules::thread_check::ACTIVE_THREAD::h60448dcb81097e92>, val=0, order=core::sync::atomic::Ordering::Relaxed) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/sync/atomic.rs:2291
 pgcentralfoundation#2  0x00007f7a944f574b in pgrx_pg_sys::submodules::thread_check::init_active_thread::clear_in_child () at src/submodules/thread_check.rs:39
 pgcentralfoundation#3  0x00007f7a962f8a88 in __run_postfork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=false, lastrun=lastrun@entry=2) at register-atfork.c:187
 pgcentralfoundation#4  0x00007f7a962df773 in __libc_fork () at fork.c:109
 pgcentralfoundation#5  0x00005555e66ad948 in fork_process () at fork_process.c:61
 pgcentralfoundation#6  0x00005555e66a5d48 in StartAutoVacWorker () at autovacuum.c:1543
 pgcentralfoundation#7  0x00005555e66c43f7 in StartAutovacuumWorker () at postmaster.c:6155
 pgcentralfoundation#8  0x00005555e66c3d21 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5820
 pgcentralfoundation#9  <signal handler called>
 pgcentralfoundation#10 0x00007f7a9630eb84 in __GI___select (nfds=6, readfds=0x7ffc61c2fa40, writefds=0x0, exceptfds=0x0, timeout=0x7ffc61c2f9b0) at ../sysdeps/unix/sysv/linux/select.c:69
 pgcentralfoundation#11 0x00005555e66be343 in ServerLoop () at postmaster.c:1950
 pgcentralfoundation#12 0x00005555e66bdb0f in PostmasterMain (argc=5, argv=0x5555e8fb5490) at postmaster.c:1631
 pgcentralfoundation#13 0x00005555e6560e41 in main (argc=5, argv=0x5555e8fb5490) at main.c:240
```

the is `pgrx_pg_sys` try to run the postfork hook. but the variable `ACTIVE_THREAD`
and the code binary does not in the previous place.
Sasasu added a commit to Sasasu/pgrx that referenced this issue Aug 17, 2023
this is because pgrx will overwrite the .so file in place.

dlopen will use mmap with MAP_PRIVATE. but if other process is modifing
the file.

dlopen will use MAP_PRIVATE to open the file and map a read-only memory
use mmap(2). usually this memory has copy-on-write. if others is modifing
the file. the previous mapped memory should not change.

but there is a undefined behavior, from man mmap(2):

```
MAP_PRIVATE: It is unspecified whether changes made to the file after the mmap()
call are visible in the mapped region.
```

what actually happens is that the read-only memory in postgresql is modified,
all pointers in the .TEXT segment is mashed up. the call stack looks like

```
 #0  0x0000000000731510 in ?? ()
 pgcentralfoundation#1  0x00007f7a94515132 in core::sync::atomic::AtomicUsize::store (self=0x7f7a95110318 <pgrx_pg_sys::submodules::thread_check::ACTIVE_THREAD::h60448dcb81097e92>, val=0, order=core::sync::atomic::Ordering::Relaxed) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/sync/atomic.rs:2291
 pgcentralfoundation#2  0x00007f7a944f574b in pgrx_pg_sys::submodules::thread_check::init_active_thread::clear_in_child () at src/submodules/thread_check.rs:39
 pgcentralfoundation#3  0x00007f7a962f8a88 in __run_postfork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=false, lastrun=lastrun@entry=2) at register-atfork.c:187
 pgcentralfoundation#4  0x00007f7a962df773 in __libc_fork () at fork.c:109
 pgcentralfoundation#5  0x00005555e66ad948 in fork_process () at fork_process.c:61
 pgcentralfoundation#6  0x00005555e66a5d48 in StartAutoVacWorker () at autovacuum.c:1543
 pgcentralfoundation#7  0x00005555e66c43f7 in StartAutovacuumWorker () at postmaster.c:6155
 pgcentralfoundation#8  0x00005555e66c3d21 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5820
 pgcentralfoundation#9  <signal handler called>
 pgcentralfoundation#10 0x00007f7a9630eb84 in __GI___select (nfds=6, readfds=0x7ffc61c2fa40, writefds=0x0, exceptfds=0x0, timeout=0x7ffc61c2f9b0) at ../sysdeps/unix/sysv/linux/select.c:69
 pgcentralfoundation#11 0x00005555e66be343 in ServerLoop () at postmaster.c:1950
 pgcentralfoundation#12 0x00005555e66bdb0f in PostmasterMain (argc=5, argv=0x5555e8fb5490) at postmaster.c:1631
 pgcentralfoundation#13 0x00005555e6560e41 in main (argc=5, argv=0x5555e8fb5490) at main.c:240
```

the is `pgrx_pg_sys` try to run the postfork hook. but the variable `ACTIVE_THREAD`
and the code binary does not in the previous place.
Sasasu added a commit to Sasasu/pgrx that referenced this issue Aug 17, 2023
this is because pgrx will overwrite the .so file in place.

dlopen will use mmap with MAP_PRIVATE. but if other process is modifying
the file.

dlopen will use MAP_PRIVATE to open the file and map a read-only memory
use mmap(2). usually this memory has copy-on-write. if others is modifying
the file. the previously mapped memory should not change.

but there is a undefined behavior, from man mmap(2):

```
MAP_PRIVATE: It is unspecified whether changes made to the file after the mmap()
call are visible in the mapped region.
```

what actually happens is that the read-only memory in postgresql is modified,
and all pointers in the .TEXT segment is mashed up. the call stack looks like

```
 #0  0x0000000000731510 in ?? ()
 pgcentralfoundation#1  0x00007f7a94515132 in core::sync::atomic::AtomicUsize::store (self=0x7f7a95110318 <pgrx_pg_sys::submodules::thread_check::ACTIVE_THREAD::h60448dcb81097e92>, val=0, order=core::sync::atomic::Ordering::Relaxed) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/sync/atomic.rs:2291
 pgcentralfoundation#2  0x00007f7a944f574b in pgrx_pg_sys::submodules::thread_check::init_active_thread::clear_in_child () at src/submodules/thread_check.rs:39
 pgcentralfoundation#3  0x00007f7a962f8a88 in __run_postfork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=false, lastrun=lastrun@entry=2) at register-atfork.c:187
 pgcentralfoundation#4  0x00007f7a962df773 in __libc_fork () at fork.c:109
 pgcentralfoundation#5  0x00005555e66ad948 in fork_process () at fork_process.c:61
 pgcentralfoundation#6  0x00005555e66a5d48 in StartAutoVacWorker () at autovacuum.c:1543
 pgcentralfoundation#7  0x00005555e66c43f7 in StartAutovacuumWorker () at postmaster.c:6155
 pgcentralfoundation#8  0x00005555e66c3d21 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5820
 pgcentralfoundation#9  <signal handler called>
 pgcentralfoundation#10 0x00007f7a9630eb84 in __GI___select (nfds=6, readfds=0x7ffc61c2fa40, writefds=0x0, exceptfds=0x0, timeout=0x7ffc61c2f9b0) at ../sysdeps/unix/sysv/linux/select.c:69
 pgcentralfoundation#11 0x00005555e66be343 in ServerLoop () at postmaster.c:1950
 pgcentralfoundation#12 0x00005555e66bdb0f in PostmasterMain (argc=5, argv=0x5555e8fb5490) at postmaster.c:1631
 pgcentralfoundation#13 0x00005555e6560e41 in main (argc=5, argv=0x5555e8fb5490) at main.c:240
```

the is `pgrx_pg_sys` try to run the postfork hook. but the variable `ACTIVE_THREAD`
and the code binary does not in the previous place.
@bjmc
Copy link

bjmc commented Oct 18, 2024

Hi, sorry to bump a closed thread, but I'm looking into implementing a foreign data wrapper (ideally using pgrx, though I'm also considering alternatives).

Has there been any movement on this topic since 2020? Any examples I could follow?

I'm thinking if I wind up writing a bunch of Rust wrappers and glue code myself, I might as well try to do it in a generic way that could be merged upstream. @eeeebbbbrrrr if you're interested, I'd very much appreciate any guidance you can offer.

From reading the docs it seems like the bulk of the effort will be the callback functions. I was wondering if maybe cargo pgrx new could be extended to generate a skeleton template for a FDW with all the functions and their signatures stubbed out and ready to be implemented.

Is it worth opening another thread for discussion? Or wait until I'm further along and submit a PR then?

@eeeebbbbrrrr
Copy link
Contributor

hi @bjmc, check out https://github.com/supabase/wrappers. They've done some really solid work using pgrx to made a nice abstraction over the FDW api. And it includes wrappers for quite a few disparate data sources.

Even as a core pgrx developer, if I were to implement a FDW today, that's where I'd start.

@bjmc
Copy link

bjmc commented Oct 18, 2024

Thank you! This looks extremely helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants