Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(Postgres): Support UNNEST #1761

Merged
merged 5 commits into from
Jun 13, 2023
Merged

Conversation

vegarsti
Copy link
Contributor

Fixes #1760.

sqlglot/transforms.py Outdated Show resolved Hide resolved
sqlglot/transforms.py Outdated Show resolved Hide resolved
@georgesittas
Copy link
Collaborator

georgesittas commented Jun 13, 2023

Is it possible to just map UNNEST to exp.Explode for postgres and make sure to rename it @ generation time? I think this would remove the need to add all this custom logic to also handle exp.Unnest.

@vegarsti
Copy link
Contributor Author

Can't we just map UNNEST to exp.Explode for postgres and make sure to rename it @ generation time? I think this would remove the need to add all this custom logic to also handle exp.Unnest

Oh, interesting, will try that!

@vegarsti
Copy link
Contributor Author

exp.Explode takes a this while exp.Unnest takes an expressions, though, does that matter? How does the mapping/renaming work?

@georgesittas
Copy link
Collaborator

How does the mapping/renaming work?

You have to map "UNNEST" to exp.Explode.from_arg_list in Postgres.Parser.FUNCTIONS and then add an entry in Postgres.Generator.TRANSFORMS to map exp.Explode to rename_func("UNNEST").

@vegarsti
Copy link
Contributor Author

How does the mapping/renaming work?

You have to map "UNNEST" to exp.Explode.from_arg_list in Postgres.Parser.FUNCTIONS and then add an entry in Postgres.Generator.TRANSFORMS to map exp.Explode to rename_func("UNNEST").

Thanks!

@vegarsti
Copy link
Contributor Author

@georgesittas That was neat! Thanks for the explanation.

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks way cleaner :)

tests/dialects/test_postgres.py Show resolved Hide resolved
@georgesittas
Copy link
Collaborator

georgesittas commented Jun 13, 2023

Ah I just noticed something in postgres's docs: UNNEST can actually receive >1 arguments. With this approach we'll crash if we get the following query (from the doc page):

SELECT * FROM UNNEST(ARRAY[1,2], ARRAY['foo','bar','baz']) AS x(a,b)

So this is actually breaking, because before we'd just parse this into an exp.Anonymous without problems. Maybe we can simply allow >1 args in exp.Explode, in order to also keep the solution here simple?

cc: @tobymao

@tobymao
Copy link
Owner

tobymao commented Jun 13, 2023

sure, please fix

@vegarsti
Copy link
Contributor Author

vegarsti commented Jun 13, 2023

Ah I just noticed something in postgres's docs: UNNEST can actually receive >1 arguments. With this approach we'll crash if we get the following query (from the doc page):

SELECT * FROM UNNEST(ARRAY[1,2], ARRAY['foo','bar','baz']) AS x(a,b)

So this is actually breaking, because before we'd just parse this into an exp.Anonymous without problems. Maybe we can simply allow >1 args in exp.Explode, in order to also keep the solution here simple?

cc: @tobymao

Ah, good catch! What does it entail to let it take more than one argument? Changing explode from having "this" to "expressions"?

I won't have time to do that tonight, so feel free to do it if you have time. Otherwise happy to try tomorrow!

@georgesittas
Copy link
Collaborator

No worries, I'll play around with it locally and let you know when I have something.

@georgesittas
Copy link
Collaborator

georgesittas commented Jun 13, 2023

Ha, this is actually already taken care of -- there might be no problem with this approach!

  • Postgres docs mention that the multi-argument form is only allowed in a query's FROM clause.
  • In the definition of _parse_table we try to parse an UNNEST expression before hitting the function parser:
    def _parse_table(
        self, schema: bool = False, alias_tokens: t.Optional[t.Collection[TokenType]] = None
    ) -> t.Optional[exp.Expression]:
        lateral = self._parse_lateral()
        if lateral:
            return lateral

        unnest = self._parse_unnest()
        if unnest:
            return unnest
        ...

This means that with this PR the parsing of multi-argument UNNEST expressions doesn't break for Postgres:

>>> import sqlglot
>>> sqlglot.parse_one("SELECT * FROM UNNEST(ARRAY[1, 2], ARRAY['foo', 'bar', 'baz']) AS x(a, b)")
(SELECT expressions:
  (STAR ), from:
  (FROM this:
    (UNNEST expressions:
      (ARRAY expressions:
        (LITERAL this: 1, is_string: False),
        (LITERAL this: 2, is_string: False)),
      (ARRAY expressions:
        (LITERAL this: foo, is_string: True),
        (LITERAL this: bar, is_string: True),
        (LITERAL this: baz, is_string: True)), alias:
      (TABLEALIAS this:
        (IDENTIFIER this: x, quoted: False), columns:
        (IDENTIFIER this: a, quoted: False),
        (IDENTIFIER this: b, quoted: False)))))
>>> sqlglot.parse_one("SELECT UNNEST(c) FROM t", "postgres")
(SELECT expressions:
  (EXPLODE this:
    (COLUMN this:
      (IDENTIFIER this: c, quoted: False))), from:
  (FROM this:
    (TABLE this:
      (IDENTIFIER this: t, quoted: False))))

There's only a (minor) issue, where UNNESTs in the FROM clause are represented as exp.Unnest nodes, whereas single-argument UNNESTs anywhere else are represented as exp.Explode nodes. If we don't mind having this disparity (in return for simplifying the transformation proposed here), then this PR is good to go.

Note: added a couple more tests.

@tobymao tobymao merged commit 0a9cecb into tobymao:main Jun 13, 2023
vegarsti added a commit to duneanalytics/harmonizer that referenced this pull request Jun 14, 2023
This was solved by tobymao/sqlglot#1761, which
is included in SQLGlot 16.1.
@vegarsti vegarsti deleted the postgres-unnest branch June 14, 2023 07:53
adrianisk pushed a commit to adrianisk/sqlglot that referenced this pull request Jun 21, 2023
* Add failing test

* Add support for Postgres' unnest

* Instead, map unnest to explode and generate appropriately

* Remove outdated

* Add more tests

---------

Co-authored-by: Jo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Postgres UNNEST
3 participants