Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query non-deterministically returns zero or one row #19477

Open
cpcloud opened this issue Oct 21, 2023 · 3 comments
Open

Query non-deterministically returns zero or one row #19477

cpcloud opened this issue Oct 21, 2023 · 3 comments

Comments

@cpcloud
Copy link
Contributor

cpcloud commented Oct 21, 2023

I'm having some trouble understanding the behavior of a query against tpch.tiny. The query is simplified from a part of TPC-H query 15:

WITH t0 AS (
  SELECT
    suppkey,
    SUM(extendedprice * (1 - discount)) AS total_revenue
  FROM
    tpch.tiny.lineitem
  WHERE
    shipdate >= CAST('1996-01-01' AS date)
    AND shipdate < CAST('1996-04-01' AS date)
  GROUP BY
    1
)

SELECT * FROM t0
WHERE t0.total_revenue = (
  SELECT MAX(total_revenue) FROM t0
)

When I put this query into a file (e.g., /data/query.sql) and then run that file I sometimes get an empty table and I sometimes get one row. I would expect there to always be one row, since there should always be at least one value equal to the max and there are no NULL values of total_revenue.

❯ trino --output-format ALIGNED --catalog tpch --schema tiny < /data/query.sql
 suppkey |   total_revenue
---------+--------------------
      21 | 1161099.4636000001
(1 row)

❯ trino --output-format ALIGNED --catalog tpch --schema tiny < /data/query.sql
 suppkey | total_revenue
---------+---------------
(0 rows)

I see that when I compute only

SELECT MAX(total_revenue) AS mtr
FROM t0

I get slightly different results due to what looks like floating point arithmetic not being associative:

❯ trino --output-format ALIGNED --catalog tpch --schema tiny < /data/query.sql
        mtr
--------------------
 1161099.4636000001
(1 row)

❯ trino --output-format ALIGNED --catalog tpch --schema tiny < /data/query.sql
     mtr
--------------
 1161099.4636
(1 row)

Even without floating point associativity, I would still expect there to always be at least one value of total_revenue equal to the max given that there are no NULLs in the column.

@martint
Copy link
Member

martint commented Oct 21, 2023

What version are you running?

@cpcloud
Copy link
Contributor Author

cpcloud commented Oct 21, 2023

@martint Version 430:

❯ trino --execute 'select version()'
"430"

@martint
Copy link
Member

martint commented Oct 21, 2023

It's because of #10.

The WITH clause t0 is evaluated twice, once in the SELECT * FROM t0 clause and once in the SELECT MAX(total_revenue) FROM t0 clause. If they produce different results due to floating point behavior, the WHERE clause will evaluate to false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants