Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.6.1 Can't parse Clickhouse argMaxIf function #53

Closed
smoothml opened this issue Apr 9, 2024 · 2 comments · Fixed by #56
Closed

v0.6.1 Can't parse Clickhouse argMaxIf function #53

smoothml opened this issue Apr 9, 2024 · 2 comments · Fixed by #56
Assignees
Labels
bug Something isn't working

Comments

@smoothml
Copy link
Contributor

smoothml commented Apr 9, 2024

Since v0.6.1 SQLMock is unable to parse Clickhouse's argMaxIf function.

Consider the following example:

from datetime import datetime

from sql_mock.clickhouse import column_mocks as col
from sql_mock.clickhouse.table_mocks import ClickHouseTableMock
from sql_mock.table_mocks import table_meta

query = """SELECT
    user_id,
    count() AS num_sessions,
    countIf(valid = 1) AS num_valid_sessions,
    argMaxIf(in_trial, dt, isNotNull(in_trial)) AS in_trial
FROM sessions
GROUP BY user_id
"""


@table_meta(table_ref="sessions")
class SessionsMock(ClickHouseTableMock):
    dt = col.Datetime(default=datetime(2024, 1, 1 ,10, 30))
    valid = col.Boolean(default=True)
    user_id = col.String(default="foo")
    in_trial = col.Int(default=0, nullable=True)


@table_meta(query=query)
class ResultMock(ClickHouseTableMock):
    user_id = col.String(default="foo")
    num_sessions = col.Int(default=0)
    num_valid_sessions = col.Int(default=0)
    in_trial = col.Int(default=0)


def test_example() -> None:
    sessions_mock = SessionsMock.from_dicts(
        [
            dict(
                dt=datetime(2024, 1, 1, 10),
                valid=1,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 2, 10),
                valid=0,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 3, 10),
                valid=1,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 4, 10),
                valid=1,
                user_id="foo",
                in_trial=None,
            ),
            dict(
                dt=datetime(2024, 1, 5, 10),
                valid=1,
                user_id="foo",
                in_trial=0,
            ),
        ]
    )

    result = ResultMock.from_mocks(input_data=[sessions_mock])

    expected = [
        dict(
            user_id="foo",
            num_sessions=5,
            num_valid_sessions=4,
            in_trial=0,
        )
    ]

    result.assert_equal(expected)

Running test_example results in the following exception:

self = <sqlglot.generator.Generator object at 0x107920d60>
expression = ('argMax', 'If'), key = None, comment = True

    def sql(
        self,
        expression: t.Optional[str | exp.Expression],
        key: t.Optional[str] = None,
        comment: bool = True,
    ) -> str:
        if not expression:
            return ""
    
        if isinstance(expression, str):
            return expression
    
        if key:
            value = expression.args.get(key)
            if value:
                return self.sql(value)
            return ""
    
        transform = self.TRANSFORMS.get(expression.__class__)
    
        if callable(transform):
            sql = transform(self, expression)
        elif transform:
            sql = transform
        elif isinstance(expression, exp.Expression):
            exp_handler_name = f"{expression.key}_sql"
    
            if hasattr(self, exp_handler_name):
                sql = getattr(self, exp_handler_name)(expression)
            elif isinstance(expression, exp.Func):
                sql = self.function_fallback_sql(expression)
            elif isinstance(expression, exp.Property):
                sql = self.property_sql(expression)
            else:
                raise ValueError(f"Unsupported expression type {expression.__class__.__name__}")
        else:
>           raise ValueError(f"Expected an Expression. Received {type(expression)}: {expression}")
E           ValueError: Expected an Expression. Received <class 'tuple'>: ('argMax', 'If')

We do not see this error for other xIf functions (e.g. the countIf in the example query) or for functions without it If (e.g. plain argMax). My initial thought was a change in sqlglot, but the version specified in poetry.lock has not changed in this release.

@smoothml smoothml added the bug Something isn't working label Apr 9, 2024
@smoothml smoothml self-assigned this Apr 9, 2024
@Somtom
Copy link
Collaborator

Somtom commented Apr 10, 2024

@smoothml Just to confirm: With previous versions it was working?

My first thought was that we might have forgotten to add a dialect to some sqlglot call somewhere - but I cannot spot anything (apart from that we probably want to add the dialect to those test parsings as well)

What I found though is that we forgot to remove a print statement.

smoothml added a commit that referenced this issue Apr 10, 2024
There has been a couple of recent issues in which test execution failed
in certain scenarios (#48 and #53). These weren't caught by the
libraries test as it would require executing actual tests against actual
SQL queries. This commit will add a framework for running these tests
against a running database engine (Clickhouse in the first instance) to
better enable these kinds of bugs to be caught.
smoothml added a commit that referenced this issue Apr 10, 2024
There has been a couple of recent issues in which test execution failed
in certain scenarios (#48 and #53). These weren't caught by the
libraries test as it would require executing actual tests against actual
SQL queries. This commit will add a framework for running these tests
against a running database engine (Clickhouse in the first instance) to
better enable these kinds of bugs to be caught.
smoothml added a commit that referenced this issue Apr 11, 2024
There has been a couple of recent issues in which test execution failed
in certain scenarios (#48 and #53). These weren't caught by the
libraries test as it would require executing actual tests against actual
SQL queries. This commit will add a framework for running these tests
against a running database engine (Clickhouse in the first instance) to
better enable these kinds of bugs to be caught.
smoothml added a commit that referenced this issue Apr 12, 2024
There has been a couple of recent issues in which test execution failed
in certain scenarios (#48 and #53). These weren't caught by the
libraries test as it would require executing actual tests against actual
SQL queries. This commit will add a framework for running these tests
against a running database engine (Clickhouse in the first instance) to
better enable these kinds of bugs to be caught.
@smoothml
Copy link
Contributor Author

Sorry @Somtom , just getting back to this.

Turns out it was indeed the print statement 🤯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants