Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: generate "queries" by stored pg function #25

Closed
aight8 opened this issue Apr 10, 2021 · 9 comments · Fixed by #64
Closed

proposal: generate "queries" by stored pg function #25

aight8 opened this issue Apr 10, 2021 · 9 comments · Fixed by #64

Comments

@aight8
Copy link

aight8 commented Apr 10, 2021

Motivation

Generally allow any complex nested queries. Currently it's unclear what is possible and when pggen cannot handle it
(as a result this issue is in contract to the promising project description)

In contrast the limitation caused on pg side ("optional arguments" because no reliable argument & result type reflection is possible) are clear.

example schema

create type list_item as (
   name text,
   color text
);

create type list_stats as (
   val1 hstore,
   val2 int[]
);

create or replace function fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10,
    out res_items list_item[],
    out res_stats list_stats
) language plpgsql as $$begin
    -- single or many query which fills res_items and res_stats 
    -- ex: with x as (...subquery...) select array_agg(...), array_agg(...) into res_items, res_stats
end$$;

keep in mind the same return values could be also created by a single statement used directly as pggen query.
this example just demonstrate the way trough stored functions since this was my initial situation I was confronted with this idea.

example query

-- name: Fn1 :one
select fn1(
    pggen.arg('x')
);

My intention is to create plpgsql stored functions which takes all needed arguments once and has at least two different datasets as return values.
There could be an arbitrary number of return values. They can be arrays, single values and of course nested.

advantages

  • one function with one parameter list to create multiple datasets
  • pggen (future): optional function parameters could be recognized (when create go code directly from a function signature)

disadvantages

  • increased complexity

challenges

  • OID's of the composit types or array types must be known? when i understand correctly this is one of the main issue, right?

questions

  • what are the current issues to support this? I saw that pggen queries all the information about the result schema like FindDescendantOIDs. When some OID's must be preprocessed in pgx, maybe a preparation function (like PrepareAll in How to enable statement caching? #26 ) could be a solution (since the pg binary msg format don't support more info on query time, a preparation of pgx is anyway indispensable)

workaround

  • batch execute multiple standalone queries

unclear

  • performance
    • regarding the scan process of the nested composit type arrays
    • call plpgsql function vs run batched prepared statements in general
@aight8 aight8 changed the title queries which query stored function (returns ct record directly) Complex nested results (ex: stored function call with multiple return arguments) Apr 13, 2021
@jschaf
Copy link
Owner

jschaf commented Apr 16, 2021

Thank you for the outstanding report!

I think what you're asking is:

Given a function with complex OUT types, pggen should generate a strongly typed struct for the output.

The problem is that fn1 has an implicit return type of record with an OID of 2249. I think what's happening is that Postgres uses the dynamic record type for any functions that have OUT parameters. A record is a dynamic type not known at runtime so there's nothing pggen can do here. If Postgres doesn't know the type then pggen won't know the type. See https://stackoverflow.com/questions/30072787.

Here's the full information from pg_type for the record OID 2249. This info is hard-coded in the Postgres catalog tables.

OID for record type

[
  {
    "oid": 2249,
    "typname": "record",
    "typnamespace": 11,
    "typowner": 10,
    "typlen": -1,
    "typbyval": false,
    "typtype": "p",
    "typcategory": "P",
    "typispreferred": false,
    "typisdefined": true,
    "typdelim": ",",
    "typrelid": 0,
    "typelem": 0,
    "typarray": 2287,
    "typinput": "record_in",
    "typoutput": "record_out",
    "typreceive": "record_recv",
    "typsend": "record_send",
    "typmodin": "-",
    "typmodout": "-",
    "typanalyze": "-",
    "typalign": "d",
    "typstorage": "x",
    "typnotnull": false,
    "typbasetype": 0,
    "typtypmod": -1,
    "typndims": 0,
    "typcollation": 0,
    "typdefaultbin": null,
    "typdefault": null,
    "typacl": null
  }
]

pg_proc details for func - Note the return type of 2249, the record type.

[
  {
    "oid": 16519,
    "proname": "composite_func_2",
    "pronamespace": 16385,
    "proowner": 10,
    "prolang": 13385,
    "procost": 100,
    "prorows": 0,
    "provariadic": 0,
    "prosupport": "-",
    "prokind": "f",
    "prosecdef": false,
    "proleakproof": false,
    "proisstrict": false,
    "proretset": false,
    "provolatile": "i",
    "proparallel": "s",
    "pronargs": 3,
    "pronargdefaults": 2,
    "prorettype": 2249,
    "proargtypes": "20 23 23",
    "proallargtypes": [20, 23, 23, 16514, 16518],
    "proargmodes": ["i", "i", "i", "o", "o"],
    "proargnames": ["param1", "page", "item_count", "res_items", "res_stats"],
    "proargdefaults": "({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 238 :constvalue 4 [ 1 0 0 0 0 0 0 0 ]} {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 263 :constvalue 4 [ 10 0 0 0 0 0 0 0 ]})",
    "protrftypes": null,
    "prosrc": "\nBEGIN\n  res_items := ARRAY [('foo', 'bar')::list_item];\n  res_stats := ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;\nEND\n",
    "probin": null,
    "proconfig": null,
    "proacl": null
  }
]

Attempt 1: Use returns table

To get type information, maybe we can add a RETURNS TABLE(items list_item[], stats list_stats). That doesn't work because you can't mix table functions with IN/OUT parameters: https://stackoverflow.com/questions/47790940.

I tried removing the out params and modifying the query to use RETURNS TABLE so we can try to get a better return type than record.

CREATE FUNCTION composite_func_3(
  param1        bigint,
  page          int = 1,
  item_count    int = 10
)
RETURNS TABLE(items list_item[], stats list_stats)
  LANGUAGE plpgsql AS $$
BEGIN
  SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;

-- Query file

-- name: CompositeFunc3 :one
SELECT composite_func_3(pggen.arg('arg1'));

This didn't work, the OID return type was 2249, the record type, so same problem as above.


Attempt 2: Use returns setof new_type

I got the proper type by declaring a type that matched the return value and using setof NEW_TYPE, like so:

CREATE TYPE CF3_type AS (
  items list_item[],
  stats list_stats
);

CREATE FUNCTION composite_func_3(
  param1        bigint,
  page          int = 1,
  item_count    int = 10
)
  RETURNS setof cf3_type
  LANGUAGE plpgsql AS $$
BEGIN
  SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;

However, that errors with:

infer typed named query CompositeFunc3: infer output types for query: fetch oid types: find composite types: find type for composite column items oid=16547

Action items

  1. Fix the attempt 2 query.

  2. I don't think pggen can do anything for queries with OUT params that have a return type of record. To get better type information, we'd have to parse the query, figure out the method that was called and then use the postgres catalog tables to figure out the types that correspond to OUT parameters. Overall, it seems a bit silly that Postgres doesn't know the types.

@aight8
Copy link
Author

aight8 commented Apr 16, 2021

An idea were if it's not possible to reflect the return types in certain cases the following:

define the function names as pggen argument.
after the db is setup with the given schema, the defined functions are reflected and a go query is created for it.
polymorphic functions (when a given function name is defined multiple times with different signature) must be defined explicitly with it's signature. SQL query statements which only calls a function were anyway just stub queries.

pggen gen go \
   --schema-glob "db/schema/**/*.sql" \
   --pg-func "fn1" \
   --pg-func "fn1_name_in_go=fn1" \ # or Fn1NameInGo ... explicit naming should be always possible
   --pg-func "poly_fn2_x=polyfn2(int, text)" \ # or PolyFn2X ... but is required for ambiguous function names
   --pg-func "poly_fn2_y=polyfn2(int, int)" # or PolyFn2Y

based on function info from pgcli and then \df+ command to list function details. it uses sql for the listed function information so I think this is a good starting point:

Schema could be defined by passing function with the schema "public.fn1", default is "public"
Name is the function name itself without any args. It creates the fundament of the naming of the call function in go.
Result data type contains the signature like:

  • text
  • TABLE(type text, oid oid, schema text, name text, params text)
  • integer[]
  • void
  • SETOF text
  • record or SETOF record
    • however Argument data types still contains the OUT args
    • from my undestanding record is the result because multiple OUT args results in a unnamed result type. there is literally an unknown type which is an union of multiple types - so this results in record. although the composition of the result are very well known.
    • TODO: what if no inout/out exists? if they exists, we can ensure that they represents the result types because you cannot mix returns and out/inout in pg

Argument data types contains the signature like:

  • hstore, text[]
  • arg1 text
  • arg1 integer, arg2 text, arg3 integer, arg4 integer DEFAULT NULL::integer, OUT arg5 integer, OUT arg6 integer

Type should be "normal" // other were among others "trigger", "agg"
Language should be "sql" or "plpgsql" // other were "c" (contains arg types like 'internal' etc. - so early error with explanation)
Source code contains the source code of the function (could be used as comment, or not)
Description could be used in the go function comment

What is interesting here is:

  • the Argument data types contains the full signature
    • Incl. arg name (if provided) -> as replacement for naming with pggen.arg('NAME') - if not provided the user don't care about the name so the go function can receive nameless args no matter how much it were (arg1, arg2, arg3... or a, b, c, ...)
    • the optionality is present (DEFAULT incl. it's value)

The generated go source code, calls the postgres function, and passes optional args only if they are defined. Otherwise the arg should be omitted so that pg can use the default value (it only use it if the arg not present in the call at all). Just for completeness: arguments with a default functions must be always after all other mandatory args in pg.

Also nice: If the function changes in the future my go code fails on build time, which is similar to static type checking - or just calling a function which not exists. When the full function signature is defined in "pg-func", of course not - then it fails even earlier by pggen while source code generation.


I try to investigate more about the "record" type issue, or when this happens.
My described example however shows, that a creation of go code could be possible by directly intepreting defined functions instead of based on prepared statements.
It's also not using extended parsing for this scenario. (side note: the most advanced pg parser is https://github.com/pganalyze/pg_query_go - it received now an update to pg13, it requires pg headers)

Note:
What also could be a solution to create a result type composit type for a complex result.

@jschaf
Copy link
Owner

jschaf commented Apr 18, 2021

Question 1: For this query, what's the signature for generated Go function?

-- name: Fn1 :one
select fn1(
    pggen.arg('x')
);

I agree that this proposal is possible, at least for simple cases, but I have 2 concerns:

  1. It's a step towards parsing the query which I've avoided so far. Postgres maintains information about the function out parameters but pggen would still have to figure out where in the query the function was called.

  2. I think you can rewrite the function to have better type information so that pggen works out of the box. Something like below:

CREATE TYPE my_custom_type ( res_items list_item[], res_stats lists_stats );

CREATE FUNCTION fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10
) RETURNS my_custom_type AS '';

I haven't tried, so if it's not possible, let me know.

@aight8
Copy link
Author

aight8 commented Apr 18, 2021

My idea is to skip the whole query definition part for a defined function.
The defined function itself is the query, and a very accurate one. (regarding optional arguments, multiple out arguments)

It not solves the reflection problem. Your last example is exactly how I use it today.

The downsides are:

  • I must use a single return type for my query which should call only one function.
    • Multiple OUT arguments are not supported (they always ends in a record return type by prepared statement analysis)
    • Optional arguments are not possible without extra annotiation
  • I must provide a pggen query stub which calls the function
  • Optional arguments are not possible (this will always be a hard to solve problem currently (without extra annotations), with defined function all information is there for free)

But that's totally fine, because pggen try it's best to figure out the types. Querying functions or a queries which returns record (this includes functions with multi OUT) is not possible, and if the an error message explain this to the user I think this is totally arguable.


I try to create a full example of my proposal:

schema

CREATE FUNCTION fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10,
    OUT res_items list_items[],
    OUT res_stats lists_stats
) AS '';

query

// nothing

generation

pggen --pg-func=fn1

go source

...
type Fn1OptParams struct {
   Page *int
   ItemCount *int
}

func (Querier) Fn1(param1 int, Fn1OptParams) (resItems ListItem[], resStats ListsStats, err error) {
   // ...
}
...

how the function parameters in return value is built up is not defined. Ideas were:

  • go options design pattern
  • optional arguments as separate struct with nullable fields (as above)
  • with similar strategy as today (up to 2 arguments are inlined, greater are all in out Param struct). However arguments with default value are pointers.

effectively called sql

select fn1(param) -- when you call in go -> querier.Fn1(999)
select fn1(param, page => 1) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1})
select fn1(param, page => 1, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1, ItemCount: &10})
select fn1(param, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{ItemCount: &10})

Because internally the query is built up by a defined function, pggen can find out the exact result signature by check for the OUT arguments, and then just join them together (because in the result section it only appears as record or setof record).
If no OUT arguments are present check for the result type.
This logic don't use prepared statement to analyze anything.
The rest of the generation logic is similar as for all other pggen queries.

@aight8
Copy link
Author

aight8 commented Apr 19, 2021

create or replace function format_types(oid[])
    returns text[] as $$
select array(select format_type(unnest($1), null))
$$ language sql immutable;

select
	 l.lanname lang,
	 n.nspname "schema",
	 p.oid,
	 p.prokind kind,         -- ! char // f = normal fn, p = procedure, a = aggregate fn, or w = window fn
	 p.proname "name",       -- ! name // name of the function (ex: get_user_info)
   format_types(p.proargtypes[:array_length(p.proargtypes,1)-p.pronargdefaults-1]) in_required_args,  -- ! oidvector (references pg_type.oid) // An array of the data types of the function arguments. This includes only input arguments (including INOUT and VARIADIC arguments), and thus represents the call signature of the function.
	 format_types(p.proargtypes[:p.pronargdefaults-1]) as in_optional_args,
	 p.pronargdefaults default_arg_count, -- ! int2 // Number of arguments that have defaults
	 p.proisstrict no_null_args,  -- ! bool // Function returns null if any call argument is null. In that case the function won't actually be called at all. Functions that are not “strict” must be prepared to handle null inputs.
	 format_type(p.provariadic, null) variadic_arg_type, -- ! oid (references pg_type.oid) // Data type of the variadic array parameter's elements, or zero if the function does not have a variadic parameter
	 format_type(p.prorettype, null) "return_type", -- ! oid (references pg_type.oid) // Data type of the return value
	 p.proretset returns_many, -- ! bool // Function returns a set (i.e., multiple values of the specified data type)
	 format_types(p.proallargtypes[array_length(p.proargtypes,1)+1:]) out_arg_types,
   p.proargmodes arg_types,    -- ! array of: 'i' for IN, 'o' for OUT, 'b' for INOUT, 'v' for VARIADIC, 't' for TABLE -- if ALL arguments are 'i' this field will be NULL
	 p.proargnames arg_names     -- ! text[] // An array of the names of the function arguments. Arguments without a name are set to empty strings in the array. If none of the arguments have a name, this field will be null. Note that subscripts correspond to positions of proallargtypes not proargtypes.
from pg_catalog.pg_proc p
   left join pg_catalog.pg_namespace n on n.oid = p.pronamespace
	 left join pg_catalog.pg_language l ON l.oid = p.prolang
where p.proname like 'my_%';
-- where p.oid='myfn()'::regprocedure::oid;

Here's another sql snippet to create many kind of functions:

drop type if exists user_data cascade;
create type user_data as (
	firstname text,
	age int
);
drop type if exists user_stat cascade;
create type user_stat as (
	stat1 int,
	stat2 int
);

-- returns: MANY(user_data[], user_stat)
create or replace function my_fn_1(
    page int = 1,
    item_count int = 10,
    out user_info_list user_data[],
    out user_stat user_stat
) returns setof record language plpgsql as $$begin

end$$;

-- 2x OUT, polymorph function #1
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
    page int = 1,
    item_count int = 10,
    out user_info_list user_data[],
    out user_stat user_stat
) returns record language plpgsql as $$begin

end$$;

-- 2x OUT, STRICT, polymorph function #2
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
    page text,
    out user_info_list user_data[],
    out user_stat user_stat
) returns setof record strict language plpgsql as $$begin

end$$;

-- VARIADIC arg
create or replace function my_fn_4(
    user_id int,
    digits variadic int[]
) returns void language plpgsql as $$begin

end$$;

@aight8 aight8 changed the title Complex nested results (ex: stored function call with multiple return arguments) proposal: generate "queries" by stored pg function Apr 20, 2021
@aight8
Copy link
Author

aight8 commented Apr 21, 2021

I came up with another idea I want to mention (should be self-described)
The query entry has no body content.

advantages:

  • Conflicts with polymorphic functions are more difficult because of the explicit naming like all other queries.
  • Queries and function queries are all in the same place.

disadvantages:

  • could be less flexible (by arg passing it would have been possible to define functions by wildcard, however that's implicit)

query

-- name: WhateverFn1 :pg-func fn_name_in_pg()
-- name: WhateverFn2 :pg-func fn_name_in_pg(text)
-- name: WhateverFn3 :pg-func another_pg_fn()

the function is created in the schema file.
the function signature in the query file must in the format how postgres can uniquely identify it. (I think pg_get_function_identity_arguments)

@jschaf
Copy link
Owner

jschaf commented Apr 26, 2021

My preferred approach so far is to use a normal query file

-- name: WhateverFn
SELECT fn_name_in_pg();

If the function return type is a record type and the function has output parameters we can infer what the results should be. I think we can get this from the explain output without too much trouble. For parsing, I think we limit it to top-level columns that match a regex like [a-z0-9]+\(.

It's not clear how it should interact with other parameters:

SELECT fn_name_in_pg(), 2 as two

I think I'd want all the parameters at the top level instead of nesting the function output parameters.

@aight8
Copy link
Author

aight8 commented May 7, 2021

I'm afrait a little when the parsing is done manually, it can end in a patchwork quilt of supported statements what pggen can intrepet correctly - and what ends in hard to understand parsing errors.
This in turn leads to great code complexity and many following issues.
The scope is unclear.

It would be hoped that the regex does not generate false positives.
In this case it would be important to define the scope - e.g. a sufficiently large list of statements that define the scope as a whole. Then possible statements that just do not work, but still recognizes - the error action is already little meaningful, the parsing aggravates the situation again.

Open questions:

  • Is this really the way to go?
  • How can the SQL be parsed, and which variant would definitely be sufficient considering the defined scope? (ex: by pg itself, go pg query lib, regex)

But wouldn't the inline function anaysing build on my proposal anyway? So this proposal could be the first step (query Function informations), and the inlining mechanism in a second step (parsing statements for function calls and apply it).

However this proposal still have a following big advantages:

  1. If I have e.g. 50 pg functions that I want to use in go, I have to create and maintain 50 pggen statments which only task is to call the function, including the function definition itself of course
  2. I can use optional input arguments with default values defined in the create function statement

jschaf added a commit that referenced this issue May 30, 2022
The main trick is to SELECT FROM the function which converts the record
type into proper postgres columns.

Fixes #25.
jschaf added a commit that referenced this issue May 30, 2022
The main trick is to SELECT FROM the function which converts the record
type into proper postgres columns.

Fixes #25.
jschaf added a commit that referenced this issue May 30, 2022
The main trick is to SELECT FROM the function which converts the record
type into proper postgres columns.

Fixes #25.
@jschaf
Copy link
Owner

jschaf commented May 30, 2022

Stumbled upon a much simpler solution. The out columns of fn will appear like normal postgres columns.

SELECT * FROM fn1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants