Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extend sql generation via to_sql attribute #410

Merged

Conversation

bitwalker
Copy link
Contributor

/cc @Hoverbear @JamesGuthrie

This PR was spawned from some needs we have over at Timescale in our Promscale extension. I looked into #377 as an option, but found that due to expectations the ToSql trait has about the entity graph, trying to manipulate entities at that stage is delicate at best. For example, you can't conditionally elide certain entities from SQL generation by removing them from the graph, because things like types are expected to exist in the graph during lowering. To make it work, you'd essentially need to reconstruct the graph within the hook to join the incoming and outgoing neighbors of those entities so that the dependency ordering is preserved without those entities being present in the output. It feels like that is exposing too much of the pgx internals implicitly, in a way that can break non-obviously and will always be fragile to internal changes.

So as an alternative, this PR introduces an approach that I think you'll like, and is less invasive while still providing quite a bit of flexibility around how entities get lowered to SQL. It works great for our needs, and others that are in a similar boat, so I'm hoping it is something you all are open to merging!


This commit introduces a new #[to_sql] attribute that allows for
customizing the behavior of SQL generation on certain entities in two
different ways:

  1. #[to_sql(false)] disables generation of the decorated item's SQL
  2. #[to_sql(path::to::function)] delegates responsibility for
    generating SQL to the named function

In the latter case, the function has almost the same signature as the
to_sql function of the built-in ToSql trait, except the function
receives a reference to a SqlGraphEntity in place of &self. This
allows for extending any of the SqlGraphEntity variants using a single
function, as trying to use specific typed functions for each entity type
was deemed overly complex. This also works well with the way ToSql is
invoked today, which starts by calling the implementation on a value of
type SqlGraphEntity, so we can perform all the checks in a single
place.

As an aside, these custom callbacks can still delegate to the built-in
ToSql trait if desired. For example, if you wish to write the
generated SQL for specific entities to a different file. One thing that might make
this even more powerful is if the lowering is by value (and also receives a mutable
reference to the PgxSql context), as this would allow custom functions to
modify certain attributes of the entities during lowering (as a simple exercise,
consider how one would modify the naming scheme used by the generator without
breaking dependents of the lowered entity - currently that isn't possible without taking
over almost all of the lowering yourself). Not super important I don't think, but something
to mull over.

The motivation for this is that in some edge cases, it is desirable to
elide or modify the SQL being generated. In our case specifically, we
need to chop up the generated SQL so that we can split out non-idempotent
operations separately from idempotent operations in order to properly
manage extension upgrades. Rather than lose the facilities pgx provides,
the #[to_sql] attribute allows us to make ad-hoc adjustments to what,
when and where things get generated without needing any upstream support
for those details.

@Hoverbear
Copy link
Contributor

Generally a fan of this idea, like the serde Derive like strategy here.

@eeeebbbbrrrr and I have just introduced a #[pgx] attribute for the aggregates in #230. I think it'd be more consistent to use that. (Thinking like serde)

I'll give this a spin. :)

@bitwalker
Copy link
Contributor Author

@Hoverbear Thanks! I can certainly rework this to use the #[pgx] attribute instead, makes sense to me. Are you thinking the end result would then be #[pgx(to_sql(false))]/#[pgx(to_sql(path::to::function))]?

As an aside, I'm not really sure why the one test is failing, but will try to reproduce and track that down today. As far as I can tell it shouldn't be affected by the changes, but it might just be something subtle with the change to the ToSql implementation for SqlGraphEntity that isn't obvious at first glance.

@Hoverbear
Copy link
Contributor

I'm really excited about the direction this PR takes us, as well as the PR. :)

@Hoverbear Hoverbear mentioned this pull request Jan 31, 2022
@Hoverbear
Copy link
Contributor

Hoverbear commented Jan 31, 2022

I was thinking more about this PR and we have a pgx_sql functionality associated with pg_extern. I think we could converge it's functionality with this.

/// ```pgx_sql
/// CREATE FUNCTION ...
/// ```
#[pg_extern]
function dooper() {}

Could become

#[pg_extern(sql = "
CREATE FUNCTION ...
")]
function dooper() {}

So I wonder if we should change to_sql to sql in prep for that... I have asked @eeeebbbbrrrr

@bitwalker
Copy link
Contributor Author

@Hoverbear I've made your recommended changes, and added a WIP commit that merges the pgxsql behavior under the #[pgx(sql)] handling as #[pgx(sql = "SQL HERE")] just to see what that looks like. We can modify that or remove it as you see fit, but once we're happy with it, I'll squash it into the the other changes.

The only thing with the merge that changes behavior, is that previously, pgxsql would result in the SQL generator rendering what would have been generated as a comment, followed by the overriding SQL code. Now, that no longer happens, and instead the overridding SQL is assumed to take over complete manual control of SQL generation for the decorated entity, same as the callback form. I'm not sure to what degree that is an issue for backwards compatibility, but we can probably special case the use of #[pgx(sql = string)] to behave exactly like pgxsql if it is essential to preserve that behavior. In any case, this was just a first stab at it, so I'm expecting some back and forth :)

pgx-macros/src/lib.rs Outdated Show resolved Hide resolved
pgx-macros/src/lib.rs Outdated Show resolved Hide resolved
pub content: Option<&'static str>,
}
impl ToSqlConfigEntity {
/// Given a SqlGraphEntity, this function converts it to SQL based on the current configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for doc comments. :)

@eeeebbbbrrrr
Copy link
Contributor

I was thinking more about this PR and we have a pgx_sql functionality associated with pg_extern. I think we could converge it's functionality with this.

/// ```pgx_sql
/// CREATE FUNCTION ...
/// ```
#[pg_extern]
function dooper() {}

Could become

#[pg_extern(sql = "
CREATE FUNCTION ...
")]
function dooper() {}

So I wonder if we should change to_sql to sql in prep for that... I have asked @eeeebbbbrrrr

I'm not jazzed about it b/c then I'll have to make changes to ZomboDB, but I suppose it's better and more consistent with how Rust would want us to do things.

@Hoverbear
Copy link
Contributor

@eeeebbbbrrrr I will personally make the changes to ZDB for you if that helps. :)

@bitwalker bitwalker force-pushed the bitwalker/to_sql-attribute branch 2 times, most recently from 571d3b0 to 69acff1 Compare February 3, 2022 04:24
@bitwalker
Copy link
Contributor Author

@Hoverbear Any recommendations on getting to the bottom of the test failures? I can't seem to get any kind of useful output explaining what's going on either locally or in the CI runs, and I'm not sure why. As best I can tell, something in the test framework is swallowing output, but I haven't been able to find where yet. Have you run into this before?

My latest local run looks like this:

❯ RUST_BACKTRACE=1 PATH=$HOME/.pgx/14.1/pgx-install/bin:$PATH cargo test --all --features "pg14 pg_test" --no-default-features
warning: pgx-parent v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx) ignoring invalid dependency `cargo-pgx` which is missing a lib target
   Compiling libc v0.2.112
<snip>
   Compiling pgx-pg-sys v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-pg-sys)
   Compiling pgx-macros v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-macros)
   Compiling cargo-pgx v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/cargo-pgx)
warning: field is never read: `name_tab`
   --> pgx-pg-sys/build.rs:339:5
    |
339 |     name_tab: HashMap<String, usize>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

warning: field is never read: `item_offset_tab`
   --> pgx-pg-sys/build.rs:341:5
    |
341 |     item_offset_tab: Vec<Option<usize>>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: `pgx-pg-sys` (build script) generated 2 warnings
   Compiling pgx v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx)
warning: field is never read: `status_code`
  --> pgx/src/spi.rs:61:5
   |
61 |     status_code: SpiOk,
   |     ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: `pgx` (lib) generated 1 warning
   Compiling pgx-tests v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-tests)
warning: `pgx` (lib test) generated 1 warning (1 duplicate)
   Compiling pgx-parent v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx)
    Finished test [unoptimized + debuginfo] target(s) in 1m 01s
     Running unittests (target/debug/deps/cargo_pgx-9bf22e45fc25ef4e)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/debug/deps/pgx-896f2e0d717f0a47)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/debug/deps/pgx_macros-58ea9063254925f7)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/debug/deps/pgx_parent-78081190ec63d461)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/debug/deps/pgx_pg_sys-02a95c8bfbc2d18f)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/debug/deps/pgx_tests-1aa90d7500c05cf6)

running 158 tests
building extension with features `pg14 pg_test`
"cargo" "build" "--features" "pg14 pg_test" "--no-default-features"
 <snip>
   Compiling pgx-utils v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-utils)
   Compiling proc-macro-crate v1.1.0
   Compiling pgx-pg-sys v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-pg-sys)
   Compiling pgx-macros v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-macros)
warning: field is never read: `name_tab`
   --> pgx-pg-sys/build.rs:339:5
    |
339 |     name_tab: HashMap<String, usize>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

warning: field is never read: `item_offset_tab`
   --> pgx-pg-sys/build.rs:341:5
    |
341 |     item_offset_tab: Vec<Option<usize>>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: `pgx-pg-sys` (build script) generated 2 warnings
   Compiling pgx v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx)
warning: field is never read: `status_code`
  --> pgx/src/spi.rs:61:5
   |
61 |     status_code: SpiOk,
   |     ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: `pgx` (lib) generated 1 warning
   Compiling pgx-tests v0.2.6 (/Users/paulschoenfelder/src/github.com/timescale/pgx/pgx-tests)
    Finished dev [unoptimized + debuginfo] target(s) in 58.74s

installing extension
     Copying control file to `/Users/paulschoenfelder/.pgx/14.1/pgx-install/share/postgresql/extension/pgx_tests.control`
     Copying shared library to `/Users/paulschoenfelder/.pgx/14.1/pgx-install/lib/postgresql/pgx_tests.so`
 Discovering SQL entities
  Discovered 289 SQL entities: 29 schemas (2 unique), 248 functions, 9 types, 1 enums, 2 sqls, 0 ords, 0 hashes, 0 aggregates
running SQL generator
"/Users/paulschoenfelder/src/github.com/timescale/pgx/target/debug/sql-generator" "--sql" "/Users/paulschoenfelder/.pgx/14.1/pgx-install/share/postgresql/extension/pgx_tests--1.0.sql"
     Copying extension schema file to `/Users/paulschoenfelder/.pgx/14.1/pgx-install/share/postgresql/extension/pgx_tests--1.0.sql`
    Finished installing pgx_tests
The files belonging to this database system will be owned by user "paulschoenfelder".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /Users/paulschoenfelder/src/github.com/timescale/pgx/target/pgx-test-data-14 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... test tests::anyarray_tests::tests::pg_test_anyarray_arg has been running for over 60 seconds
test tests::array_tests::tests::pg_test_array_deny_nulls has been running for over 60 seconds
test tests::array_tests::tests::pg_test_count_nulls has been running for over 60 seconds
test tests::array_tests::tests::pg_test_count_true has been running for over 60 seconds
test tests::array_tests::tests::pg_test_count_true_sliced has been running for over 60 seconds
test tests::array_tests::tests::pg_test_optional_array has been running for over 60 seconds
test tests::array_tests::tests::pg_test_optional_array_with_default has been running for over 60 seconds
test tests::array_tests::tests::pg_test_return_text_array has been running for over 60 seconds
test tests::array_tests::tests::pg_test_return_zero_length_vec has been running for over 60 seconds
test tests::array_tests::tests::pg_test_serde_serialize_array has been running for over 60 seconds
ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /Users/paulschoenfelder/.pgx/14.1/pgx-install/bin/pg_ctl -D /Users/paulschoenfelder/src/github.com/timescale/pgx/target/pgx-test-data-14 -l logfile start

Stopping Postgres

error: test failed, to rerun pass '-p pgx-tests --lib'

@Hoverbear
Copy link
Contributor

@bitwalker I have encountered this too, try cargo pgx run and then create extension $EXT and see if it gives you a better error?

@bitwalker
Copy link
Contributor Author

@Hoverbear I think once this PR gets merged it'll stop hassling you, I hope haha. At least it looks like at least this run is finally going to be green 🤞

@Hoverbear
Copy link
Contributor

Yessssss!!! I'll give it a final lookover and get it merged. Thanks. :)

#[pg_extern(sql = false)]
fn func_elided_from_schema() {}

#[pg_extern(sql = "generate_function")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should actually be valid to do like #[pg_extern(sql = generate_function)] which would remove any ambiguity about if it's SQL or a function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think that makes more sense, I'll adjust it accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hoverbear I just realized that the reason I used a string literal here, is that MetaNameValue expects the value part of key = value to be a syn::Lit. Your suggestion isn't supported AFAIK, unless you choose to avoid Attribute::parse_meta for parsing the attribute contents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess part of my concern (and want for that distinction) is I could very well see a bug where this code:

https://github.com/zombodb/pgx/blob/a9d18c59cf0e3932f7cf163cc2ec36318f271ad2/pgx-utils/src/sql_entity_graph/to_sql.rs#L78-L92

Parses some SQL as a Rust Path for some reason, and creates weird behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in that case, they are abandoning Attribute::parse_meta in favor of Attribute::parse_args, which allows them to use their own grammer for the attribute contents by manually parsing them. It's more effort but worth it when you need to support something that doesn't fit into the standard structured meta arguments approach. We can do something similar if you think it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely prefer a lack of ambiguity that if you have time, I know I've already asked a lot of you so if you are pressed on time I can wrap it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have a commit that implements this ready shortly, just FYI

pgx-macros/src/lib.rs Outdated Show resolved Hide resolved
pgx-macros/src/lib.rs Outdated Show resolved Hide resolved
pgx-macros/src/lib.rs Outdated Show resolved Hide resolved
@bitwalker
Copy link
Contributor Author

Once you're happy with this, just let me know and I'll squash commits, just leaving them broken up while we iterate on things

@Hoverbear
Copy link
Contributor

Newest commit 344f685 looks good to me. :)

@bitwalker
Copy link
Contributor Author

Awesome! Once the tests pass, I'll go ahead and squash then :)

This commit extends the `#[pgx]` and `#[pg_extern]` attributes with a new
`sql` argument type that allows for customizing the behavior of SQL generation
on certain entities in three different ways:

1. `#[pgx(sql = false)]` disables generation of the decorated item's SQL
2. `#[pgx(sql = path::to::function)]` delegates responsibility for
   generating SQL to the named function
3. `#[pgx(sql = "RAW SQL CODE;")]` uses the provided SQL string in place
   of the SQL that would have been generated by default

In the 2nd case, the function has almost the same signature as the
`to_sql` function of the built-in `ToSql` trait, except the function
receives a reference to a `SqlGraphEntity` in place of `&self`. This
allows for extending any of the `SqlGraphEntity` variants using a single
function, as trying to use specific typed functions for each entity type
was deemed overly complex. This also works well with the way `ToSql` is
invoked today, which starts by calling the implementation on a value of
type `SqlGraphEntity`, so we can perform all the checks in a single
place.

As an aside, these custom callbacks can still delegate to the built-in
`ToSql` trait if desired. For example, if you wish to write the
generated SQL for specific entities to a different file.

The motivation for this is that in some edge cases, it is desirable to
elide or modify the SQL being generated. In our case specifically, we
need to chop up the generated SQL so that we can split out non-idempotent
operations separately from idempotent operations in order to properly
manage extension upgrades. Rather than lose the facilities pgx provides,
the `#[pgx(sql)]` attribute allows us to make ad-hoc adjustments to what,
when and where things get generated without needing any upstream support
for those details.
LANGUAGE c /* Rust */\n\
AS 'MODULE_PATHNAME', '{unaliased_name}_wrapper';\
",
unaliased_name = func.unaliased_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 I'm gonna need to clean up this API a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar thought when I was thinking through how to expose stuff to those functions haha. In my perfect world, users of this would get access to a hypothetical SqlBuilder API that let you replicate this using the builder pattern:

fn generate_function(entity: &SqlGraphEntity, builder: &mut SqlBuilder, _context: &PgxSql) -> Result<(), Box<dyn std::error::Error + Send + Sync + 'static>> {
    if let SqlGraphEntity::Function(ref f) = entity {
        // get_or_create lets you modify existing entities after initial construction, but before the builder is consumed
        // returns a entity builder that holds a mutable reference to the parent builder
        // when the entity is finalized, only then is the entity added to the parent builder
        let mut func = builder.get_or_create_function(f.name);
        func
          .args(&[]) // defaults to no arguments if not set
          .returns("void") // defaults to 'void' if not set
          .replace(true) // indicates the builder should use CREATE OR REPLACE
          .lang(FunctionLanguage::Rust) // lowers to LANGUAGE c /* Rust */
          .as_nif("MODULE_PATHNAME", format!("{}_wrapper", func.unaliased_name)) // indicates the function is a Natively Implemented Function from the given module
          .build()?; // adds the built function to the parent builder
        Ok(())
    } else {
        panic!("expected extern function entity, got {:?}", entity);
    }
}

It'd be a big undertaking to cover all possible entities, but could probably start by just covering the cases that pgx itself requires, expose some kind of escape hatch akin to extension_sql!, and expand the API with specific entities as needed. Not sure if it is worth it or not, but would be a lot nicer and less error prone to work with!

@Hoverbear
Copy link
Contributor

I think my only remaining nit about this PR is we don't leave an comment anchor in the SQL noting that, for example, the content was overridden or skipped. I think I'll make a follow up PR for this since I've asked enough of you. :)

@bitwalker
Copy link
Contributor Author

@Hoverbear Yeah I think that would be fairly easy to add. I also think it'd be nice to ensure some easy mistakes are caught by pgx when using SQL strings returned from the callback or given as a literal, e.g. that any non-comment lines are terminated with a semicolon. I didn't add any such checks, since I wasn't sure what degree of validation we would want (though I did make an effort to ensure newlines are prepended when rendering the strings to help readability of the final output), but it is almost certainly something that will help avoid unnecessary bug reports from users.

I agree though that it probably makes sense to merge this and make some of those improvements in follow-on PRs just to keep things tighter for review, just let me know what you need from me, and I'll try to make some time :)

@Hoverbear
Copy link
Contributor

Let's merge this as it is now, I'm just prepping a follow up for the anchor comments as I had a particular way I wanted to do it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants