Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PostgreSQL] Improve parsing support for postgresql data types #73

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ntma
Copy link

@ntma ntma commented Oct 18, 2021

This PR will add:

  • a new property default_value_raw to the types/column.ts that will contain the default raw value inspected from schema definitions;
  • the dependency pg-types, that contains a good collection of parsers for PostgreSQL data types.

New types parsed:

  • timestamps/dates;
  • arrays in the format '{}';

Caveat

The module pg-types uses the data type oid to find the appropriate parser for the column type. In contrast, knex-schema-inspector uses information_schema.data_type. To bridge the pg-types approach to the knex-schema-inspector, this PR uses this map that may require to be updated in the future.

Tasks

  • add new property default_raw_value
  • update all dialects to take into account default_value_raw;
  • integrate pg-types to support pg native data types;
  • additional logic to parse arrays;
  • update unit tests;

Closes #72

@ntma ntma changed the title [PostgreSQL] Completed parsing support for postgresql data types WIP [PostgreSQL] Completed parsing support for postgresql data types Oct 18, 2021
@rijkvanzanten
Copy link
Collaborator

This is great @ntma! Way stabler approach to casting those default values 👍🏻

@ntma
Copy link
Author

ntma commented Oct 19, 2021

Glad you liked it @rijkvanzanten, but there are some caveats unfortunately. I'll try to post them as soon as possible and perhaps we can discuss a solution for them.

@rijkvanzanten
Copy link
Collaborator

Sounds good 👍🏻

@ntma
Copy link
Author

ntma commented Oct 19, 2021

The biggest caveat is that this PR works for the postgres built-in types, BUT array's are not considered a built-in. Hence their default values will be parsed as plain string's...

Still, I believe we can make this PR work with a few adjustments. Hope I can be clear in the explanation.

Intro for the solution

When retrieving column information, knex-schema-inspector queries the view information_schema.columns and parses the column type's from the data_type (pointer to code). But the postgres documentation states that the data_type translates into:

Data type of the array elements, if it is a built-in type, else USER-DEFINED (in that case, the type is identified in udt_name and associated columns).

Which means that data_type is unable to accurately identify all the data types in a schema (built-in and user-defined types).

To test this, run the following query for a table with a column of type array:

SELECT 
   table_name, 
   column_name, 
   data_type, -- this will return ARRAY for array's
   udt_name  -- this will return the actual array type
FROM 
   information_schema.columns
WHERE 
   table_name = 'YOUR_TABLE_HERE';

Possible solution

The idea is to start using the udt_name to identify column type's (or keep both but use the udt_name for the parsing).

The first step would be adding the udt_name to the query on information_schema.columns and join it with the following query to retrieve the data type Id:

SELECT
   oid, -- data type id
   typname -- udt_name in the information_schema
FROM pg_type

Then change the signature of parseDefaultValue from (type: string) to (columnDefault: string | null, type: integer) where:

  • columnDefault - column default value (ie: "ARRAY[]::json[]");
  • type - column type oid (ie: 199);

And finally use the oid to retrieve the appropriate parser.

@rijkvanzanten
Copy link
Collaborator

Woa, yeah that got way more complicated than I thought haha. Time to call in the cavalry. @Oreilles any thoughts on the matter?

@Oreilles
Copy link
Contributor

let [value, cast] = type.split('::');

This is still fundamentally wrong anyway, because it cannot handle expression, eg. NOW() + '2 hours'::interval or anything involving function or operators. Since there is no error handling, I assume that return parser(value) will crash with an expression.

Another thing I think of is that since Postgres has built-in type casting, the actual column type might differ from the default value expression return type, which could there again crash the parser.

So I think you'd still need to add something like

try {
       return parser(value);
catch(e) {
    return type
}

But I think it is a dark pattern to make the defaultValue property hold either a proper Javascript object or the original SQL expression. You'd need to check everywhere in further code if that defaultValue is usable, and if the column is already of a string-like type, you might not even be able to make the difference.

@ntma
Copy link
Author

ntma commented Oct 20, 2021

This is still fundamentally wrong anyway, because it cannot handle expression, eg. NOW() + '2 hours'::interval or anything involving function or operators. Since there is no error handling, I assume that return parser(value) will crash with an expression.

I didn't test it, but it will probably not crash because interval belongs to the set of built-ins in pg-types:
https://github.com/brianc/node-pg-types/blob/8594bc6befca3523e265022f303f1376f679b5dc/lib/builtins.js#L49

Which means we could get the proper parser for it:
https://github.com/brianc/node-pg-types/blob/8594bc6befca3523e265022f303f1376f679b5dc/lib/textParsers.js#L174

But yes, definitely needs a try/catch for some edge cases.

Another thing I think of is that since Postgres has built-in type casting, the actual column type might differ from the default value expression return type, which could there again crash the parser.

That's true... but I think you will need the column type in some cases. If we define a column of type boolean and default it to true, the default value will be parsed as a string because there is no casting.

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we simply return null if the parsing goes wrong?

@Oreilles
Copy link
Contributor

Oreilles commented Oct 20, 2021

I didn't test it, but it will probably not crash because interval belongs to the set of built-ins in pg-types:

It would not crash because the value is an interval, but because NOW() + '2 hours' will be parsed as an interval when that's an expression, not a string representation of a postgres interval.

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we simply return null if the parsing goes wrong?

This is misleading, because you then couldn't make the difference between a field that actually doesn't have a default value, and a field of which you just couldn't resolve the default value expression to a javascript object.

@rijkvanzanten
Copy link
Collaborator

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we
simply return null if the parsing goes wrong?

This is misleading, because you then couldn't make the difference between a field that actually doesn't have a default value, and a field of which you just couldn't resolve the default value expression to a javascript object.

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value 🤔

@Oreilles
Copy link
Contributor

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value 🤔

I think that's the best solution

@ntma
Copy link
Author

ntma commented Oct 20, 2021

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value thinking

I think that's the best solution

And should we only focus built-in types for now?

@rijkvanzanten
Copy link
Collaborator

And should we only focus built-in types for now?

I think so 🤔 I'm having some trouble imagining how we'd even be able to support non-built-ins, as there's a theoretical infinite amount of them

@ntma
Copy link
Author

ntma commented Oct 21, 2021

I think so 🤔 I'm having some trouble imagining how we'd even be able to support non-built-ins, as there's a theoretical infinite amount of them

Alright, I'll make it simple then 👍

… types map for information_schema.data_type | updated unit tests
@ntma
Copy link
Author

ntma commented Nov 2, 2021

Hello guys,

Sorry for the delay in the development but I've been extremely busy in the last weeks.

So, with this last commit I added:

  • the default_value_raw property to the Column interface, which will contain the raw default value when parsing the schema information (code here);
  • a map to complement the built-in postgres data types to increase the parsing coverage of knex (mainly because aliases) (code here);
  • the multiple_defaults table in the unit tests (code here);

If everyone agrees with these changes I will update the remaining database dialects/unit tests to include the default_value_raw.

@rijkvanzanten
Copy link
Collaborator

Sorry for the delay in the development but I've been extremely busy in the last weeks.

No worries, same here!

If everyone agrees with these changes I will update the remaining database dialects/unit tests to include the default_value_raw.

Looks good to me! @Oreilles thoughts?

@Oreilles
Copy link
Contributor

Oreilles commented Nov 3, 2021

Looks good!
Would be nice to have a test for #72 too since this was the original reason for this PR.

@ntma
Copy link
Author

ntma commented Nov 3, 2021

@Oreilles Agreed! But parsing arrays was still not working because postgres parses the default value if it is an array. A value declared like this:

ARRAY['{"text":"Lorem Ipsum"}', '{"text":"Lorem Ipsum"}']::json[]

will look like the following in the information_schema table:

ARRAY['{"text":"Lorem Ipsum"}'::json, '{"text":"Lorem Ipsum"}'::json]

And so, to account arrays I added an extra case to the parsing logic (code here). It feels a bit naive considering that any , that doesn't act as the array separator will split the parts incorrectly (in which case null is returned). But validates these unit tests.

@ntma ntma changed the title WIP [PostgreSQL] Completed parsing support for postgresql data types [PostgreSQL] Improve parsing support for postgresql data types Dec 21, 2021
@ntma
Copy link
Author

ntma commented Dec 22, 2021

Hi guys,

I hate leaving unfinished businesses behind, so I gave another round to this PR. With the latest changes:

  • all dialects are using the new default_value_raw, that contains the unparsed column default;
  • arrays in the format of '{el1,el2,el3}' are parsable;
  • arrays in the format of ARRAY[el1,el2,el3] are not parsable, as in some cases the information_schema.data_type is not enough to identify the array element types. Empty arrays will still return their Javascript version though.
  • any unparsed value will return null.

For now, I think this is as far as this PR will go. A more complete version will probably require considerate changes to the code.

Any questions or additional fixes that are vital to merge this PR, please let me know 👍


let [value, cast] = type.split('::');
let [value, cast] = column_default.split('::');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value contains "::", i think you must search the last "::"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uncaught error when retrieving columns info with jsonb[] with default values
4 participants