Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: aggregate column rewritten incorrectly to use WHERE instead of HAVING #12550

Closed
olyazavr opened this issue Mar 3, 2023 · 3 comments · Fixed by #12668
Closed

Bug Report: aggregate column rewritten incorrectly to use WHERE instead of HAVING #12550

olyazavr opened this issue Mar 3, 2023 · 3 comments · Fixed by #12668

Comments

@olyazavr
Copy link
Contributor

olyazavr commented Mar 3, 2023

Overview of the Issue

Vitess rewrites
select a from (select a, count(*) as count from data where ... group by ... order by ...) t1 where t1.count >= :daysLimit
as
select a from (select a, count(*) as count from data where ... and count(*) >= :v2 group by ... order by ...) as t1
which is incorrect: it doesn't understand that count is an aggregate column and cannot be used in WHERE and must be used in HAVING. This results in the error Invalid use of group function

Workaround is to make the query more explicit so as to prevent that edge case:

select a from (select a, count(*) as count from data where ... group by ... having count(*) >= :daysLimit order by ...)

This seems similar to #11311 but we have that fix pulled in already, so this seems like another variant

Reproduction Steps

select t1.portalId, t1.flowId from (select portalId, flowId, count(*) as count from suspect_high_cost_flow_data where localDate > :v1 group by portalId, flowId order by null) t1 where t1.count >= :v2
CREATE TABLE `suspect_high_cost_flow_data` (
  `portalId` int(10) unsigned NOT NULL,
  `flowId` bigint(20) unsigned NOT NULL,
  `localDate` int(10) unsigned NOT NULL,
  `totalObjects` bigint(20) unsigned NOT NULL,
  `enrollmentsOfTheDay` bigint(20) unsigned NOT NULL,
  `executionsOfTheDay` bigint(20) unsigned NOT NULL,
  `uniqueEnrollments` bigint(20) unsigned NOT NULL,
  `totalEnrollments` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`portalId`,`flowId`,`localDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "suspect_high_cost_flow_data": {
      "columnVindexes": [
        {
          "column": "portalId",
          "name": "hash"
        }
      ]
    }
  }
}

Binary Version

Vitess 14, up to this commit: https://github.com/vitessio/vitess/commit/01ec41fee8e0f0feca4d8b4857f11076adf6be4e

Operating System and Environment details

Centos8, Linux 5.4.141-hs22.el8.x86_64

Log Fragments

No response

@olyazavr olyazavr added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Mar 3, 2023
@shlomi-noach shlomi-noach added Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 7, 2023
@systay systay self-assigned this Mar 20, 2023
@systay
Copy link
Collaborator

systay commented Mar 20, 2023

Hi @olyazavr 👋

the query you listed under “Reproduction Steps” already has the count(*) >= :v2 expression in the WHERE clause
I'm guessing that is not the correct query

@olyazavr
Copy link
Contributor Author

Ah sorry about that, here's the actual query: select t1.portalId, t1.flowId from (select portalId, flowId, count(*) as count from suspect_high_cost_flow_data where localDate > :v1 group by portalId, flowId order by null) t1 where t1.count >= :v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants