Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent data / type when using json subquery #1876

Open
veriKami opened this issue Apr 12, 2019 · 7 comments
Open

inconsistent data / type when using json subquery #1876

veriKami opened this issue Apr 12, 2019 · 7 comments

Comments

@veriKami
Copy link

veriKami commented Apr 12, 2019

It looks like pg-types (parsers) are not used inside json objects, resulting in inconsistent data. It could be considered a bug, or at least it's worth some note in the documentation. Sample code and result below:

SELECT
    now(),
    random()::numeric,
    (SELECT row_to_json(r) FROM (
        SELECT now(),
        random()::numeric
    ) r );
{
    "now": "2019-04-12T10:42:17.767Z",
    "random": "0.554001417476684",
    "row_to_json": {
        "now": "2019-04-12T12:42:17.767661+02:00",
        "random": 0.580252414103597
    }
}
@sehrope
Copy link
Contributor

sehrope commented Apr 12, 2019

It's not a bug as the data type is JSON and it's overall type parser handles that as the data type. If you have more specific requirements for parsing the nested attributes with a more specific data type you would need to do that in your application code or add your own custom parser.

A generalized "parse JSON as native types" is not possible as JSON does not include type information for the original source data. There's no way to tell differentiate between now()as a timestamp cast as a string, vs a text column as a string, vs any other data type cast as a string. Logic for that would be specific to each application to know that "now" is a Date column, "random" is a number, etc.

@veriKami
Copy link
Author

veriKami commented Apr 12, 2019

Yes, "generally" I agree with you. I am currently writing my own application logic to connect (coherently) databases in different time zones and I discovered "this thing", looking in pg-types, by the way. That's why I wrote that (maybe) it could be, if not considered as a bug, but, however, described somewhere...?... Intuitively, one assumes that formats are consistent throughout the library and all types of queries. That's the point. And (maybe) implementation of the parser inside a parser is possible ...?... I do not know the specifics of the library so much, that I could stated, so this is a kind of rhetorical question.

@boromisp
Copy link
Contributor

As @sehrope explained for the timestamp: when you call row_to_json, PostgreSQL converts the timestamp to string, because timestamps cannot be natively represented in JSON. There is no way for the driver to reliably know, that the value was originally a timestamp.

So what you are comparing is a string generated by the database, and a JavaScript Date object stringified by node.js. And of course, JavaScript's Date.toJSON and PostgreSQL's to_json generate slightly different string representations. JavaScript always uses the zulu time zone, while PostgreSQL uses the local time zone and a numerical time zone offset. Compare the output of these two commands:
node -e 'console.log(JSON.stringify(new Date()))'
psql -Atc "set session time zone 'UTC'; select to_json(now())"

So in regards to timestamps, I believe the driver does the right thing, and I'm not sure what should be documented, as it doesn't do anything unexpected.

There are two ways to make the result uniform: either convert the timestamp columns to text in the query: trim(both '"' from to_json(now())::text), or register a custom parser for the json type, that can (based on pattern matching or known key names) convert strings to Date. Neither option is ideal since in the first case you lose type information, and in the second case, there would be a significant performance penalty, but in you use-case one of them might be worth it.

The json type's numeric value does have a precision issue, and there is an argument to be made for documenting it somewhere more visible, than a closed issue.

@veriKami
Copy link
Author

veriKami commented Apr 12, 2019

Thanks @boromisp for such a detailed explanation !!! I started with problems with the date, but now I see that this one can be quite simply solved - excluding the parser in my case. Of course, this is not a bug, but rather my ignorance of the "specifics". By default, I assumed that parsers work on all data, and after a long time I only came to the json subqueries that were stitched somewhere inside the application.

But the matter of representation of numerical values is more serious (from what I have read in node-pg-types issues), and it is not without a reason that the text format is the default. This may be The Argument, that the case of json subquery (without parsers) is potentially buggy and risky. It's just a voice in the discussion, but maybe someone will save some time in the future...

The idea behind this issue - just like in the title - is just to point out that the data may be inconsistent with the rest in this case and it is worth bearing in mind.

Sample with number 2**53 + 1

SELECT
    '9007199254740993'::numeric,
    (SELECT row_to_json(r) FROM (
        SELECT 
            '9007199254740993'::numeric
    ) r );
{
    "numeric": "9007199254740993",
    "row_to_json": {
        "numeric": 9007199254740992
    }
}

@veriKami
Copy link
Author

UPDATE – expanding Title – & when working with Timezones

and expanding Experiment

with (postgresql.conf) timezone = 'Europe/Warsaw'
or (sql) SET TIMEZONE = 'Europe/Warsaw'
and (node) $ env TZ='Europe/Warsaw' node server.js

and query:

SELECT
    current_setting('TIMEZONE')        AS "TZ_current",
    now()                              AS "now_______",
    now()::timestamptz                 AS "now_tstz__",
    now()::timestamp                   AS "now_ts____",
    now() AT TIME ZONE 'Europe/Warsaw' AS "now_WAW___",
    now() AT TIME ZONE 'UTC'           AS "now_UTC___",
    (SELECT row_to_json(r) FROM (
        SELECT
            now()                              AS "now_______",
            now()::timestamptz                 AS "now_tstz__",
            now()::timestamp                   AS "now_ts____",
            now() AT TIME ZONE 'Europe/Warsaw' AS "now_WAW___",
            now() AT TIME ZONE 'UTC'           AS "now_UTC___"
    ) r );

the results look like this:

console.log(process.env.TZ); //: (ok) Europe/Warsaw
console.log(new Date()); //: 2019-04-14T12:38:47.301Z (ok) UTC
console.log(new Date().getTimezoneOffset()/60): //: (ok) -2

{
  "TZ_current": "Europe/Warsaw",
  "now_______": "2019-04-14T12:38:47.301Z",
  "now_tstz__": "2019-04-14T12:38:47.301Z",
  "now_ts____": "2019-04-14T12:38:47.301Z",
  "now_WAW___": "2019-04-14T12:38:47.301Z",
  "now_UTC___": "2019-04-14T10:38:47.301Z",
  "row_to_json": {
    "now_______": "2019-04-14T14:38:47.301076+02:00",
    "now_tstz__": "2019-04-14T14:38:47.301076+02:00",
    "now_ts____": "2019-04-14T14:38:47.301076",
    "now_WAW___": "2019-04-14T14:38:47.301076",
    "now_UTC___": "2019-04-14T12:38:47.301076"
  }
}

types.setTypeParser(1114, (s) => s); //: timestamp
types.setTypeParser(1184, (s) => s); //: timestamptz

{
  "TZ_current": "Europe/Warsaw",
  "now_______": "2019-04-14 14:39:35.911219+02",
  "now_tstz__": "2019-04-14 14:39:35.911219+02",
  "now_ts____": "2019-04-14 14:39:35.911219",
  "now_WAW___": "2019-04-14 14:39:35.911219",
  "now_UTC___": "2019-04-14 12:39:35.911219",
  "row_to_json": {
    "now_______": "2019-04-14T14:39:35.911219+02:00",
    "now_tstz__": "2019-04-14T14:39:35.911219+02:00",
    "now_ts____": "2019-04-14T14:39:35.911219",
    "now_WAW___": "2019-04-14T14:39:35.911219",
    "now_UTC___": "2019-04-14T12:39:35.911219"
  }
}

and for me this is a Problem@boromisp – what do you think?

@boromisp
Copy link
Contributor

The way I see it, the local time zone setting of both postgresql and nodejs is only supposed to make the string representation of utc timestamps more easily readable by humans.
For computers the time zone offset shouldn't matter.
I usually transform the timestamps to the user's time zone based on their local environment or a stored user setting. In JavaScript the Intl.DateTimeFormat or the time zone extension of moment.js can be useful for this purpose.

Of course there could be use cases I haven't considered. Would you mind expanding on yours?

@mikicho
Copy link

mikicho commented Apr 14, 2021

Thanks @boromisp for such a detailed explanation !!! I started with problems with the date, but now I see that this one can be quite simply solved - excluding the parser in my case.

Can you please explain how did you solve it?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants