-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: support all window functions in vectorized engine #37035
Labels
A-sql-vec
SQL vectorized engine
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
yuzefovich
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-sql-execution
Relating to SQL execution.
labels
Apr 23, 2019
yuzefovich
added
A-sql-vec
SQL vectorized engine
and removed
A-sql-execution
Relating to SQL execution.
labels
Apr 23, 2019
6 tasks
craig bot
pushed a commit
that referenced
this issue
May 1, 2019
36926: exec: add a few window function operators in simple cases r=yuzefovich a=yuzefovich Adds support for ROW_NUMBER, RANK, and DENSE_RANK window functions when there is a single window function in the query with no PARTITION BY clause and with no window frame. Addresses: #37035. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Mar 3, 2020
42137: colexec: add support for percent_rank and cume_dist window functions r=yuzefovich a=yuzefovich **colexec: add window peer grouper** This commit adds an operator that populates a boolean column to signify whether the corresponding tuple is the start of a new peer group. Peers are such tuples that belong to the same partition and are equal on the ordering columns. Some window functions must return the same output for all peers in the peer group. Currently this operator is not used. Release note: None **colexec: minor refactor of rank and denseRank operators** Previously, rank and denseRank operators contained the logic to figure out the boundaries of the peer groups, but now that we have a window peer grouper, it is no longer necessary which simplified the code a little bit. Release note: None **colexec: add support for PERCENT_RANK window function** This commit adds the support for PERCENT_RANK window function. This function differs from two other rank variances in that it needs to know the number of tuples in the partition. If there is no PARTITION BY clause, then we have no other choice but to buffer the input fully. If PARTITION BY clause is present, we need to buffer all tuples that belong to each partition before we can populate the output. However, for simplicity, the current implementation of the operator with PARTITION BY clause also fully buffers the whole input before emitting any output. This commit also adds a couple of "vec-on" configs to 'window' logic test. This will increase the test coverage of window functions supported by the vectorized engine. Addresses: #37035. Release note: None **colexec: add support for cume_dist window function** This commit adds support for CUME_DIST window function. This function is quite similar to PERCENT_RANK, so it reuses the same template. "percent_rank" things have been renamed to "relative_rank" things. This commit also enables running of `row_number`, `rank`, and `dense_rank` with `vectorize=auto`. The reason is that these window functions are streaming and internally they might a sorter which can fallback to disk if necessary. Addresses: #37035. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
DrewKimball
pushed a commit
to DrewKimball/cockroach
that referenced
this issue
May 18, 2021
This patch implements the `ntile` window function in the vectorized engine. `ntile` takes in an integer argument `num_buckets` and then distributes all rows in a partition equally between the buckets, outputting the bucket number for each row. In the vectorized implementation, batches are buffered until the end of a partition is reached, at which point the `ntile` bucket values can be calculated. The batches are emitted in a streaming fashion; as soon as a batch is fully processed, it is returned and work is paused until the next call to `Next()`. See cockroachdb#37035 Release note (sql change): the vectorized engine now supports the ntile window function.
craig bot
pushed a commit
that referenced
this issue
May 18, 2021
64977: colexec: implement vectorized ntile window function r=DrewKimball a=DrewKimball This patch implements the `ntile` window function in the vectorized engine. `ntile` takes in an integer argument `num_buckets` and then distributes all rows in a partition equally between the buckets, outputting the bucket number for each row. In the vectorized implementation, batches are buffered until the end of a partition is reached, at which point the `ntile` bucket values can be calculated. The batches are emitted in a streaming fashion; as soon as a batch is fully processed, it is returned and work is paused until the next call to `Next()`. See #37035 Release note (sql change): the vectorized engine now supports the ntile window function. 65322: jobs: surface the trace_id linked to job execution r=adityamaru a=adityamaru Please see individual commit messages. Informs: #64992 65415: authors: add annezhu98 to authors r=annezhu98 a=annezhu98 Release note: None 65418: authors: add <IsaDash> to authors r=IsaDash a=IsaDash Release note: None Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Anne Zhu <[email protected]> Co-authored-by: Isa Dash <[email protected]>
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
May 24, 2021
`lag` and `lead` are window functions that return an expression evaluated at some row `offset` rows from the current row, defaulting to `default` if such a row does not exist in the partition. This patch provides vectorized implementations of the `lag` and `lead` window functions. The logic to buffer each partition into a queue is now separate from calculation of the window function itself so that `lag`, `lead`, and `ntile` (and others) can share common logic. As before, each batch is emitted as soon as its output column has been entirely filled. See cockroachdb#37035 Release note (sql change): the vectorized engine now supports the lag and lead window functions.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
May 26, 2021
`lag` and `lead` are window functions that return an expression evaluated at some row `offset` rows from the current row, defaulting to `default` if such a row does not exist in the partition. This patch provides vectorized implementations of the `lag` and `lead` window functions. The logic to buffer each partition into a queue is now separate from calculation of the window function itself so that `lag`, `lead`, and `ntile` (and others) can share common logic. As before, each batch is emitted as soon as its output column has been entirely filled. See cockroachdb#37035 Release note (sql change): the vectorized engine now supports the lag and lead window functions.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
May 27, 2021
`lag` and `lead` are window functions that return an expression evaluated at some row `offset` rows from the current row, defaulting to `default` if such a row does not exist in the partition. This patch provides vectorized implementations of the `lag` and `lead` window functions. The logic to buffer each partition into a queue is now separate from calculation of the window function itself so that `lag`, `lead`, and `ntile` (and others) can share common logic. As before, each batch is emitted as soon as its output column has been entirely filled. See cockroachdb#37035 Release note (sql change): the vectorized engine now supports the lag and lead window functions.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
May 27, 2021
`lag` and `lead` are window functions that return an expression evaluated at some row `offset` rows from the current row, defaulting to `default` if such a row does not exist in the partition. This patch provides vectorized implementations of the `lag` and `lead` window functions. The logic to buffer each partition into a queue is now separate from calculation of the window function itself so that `lag`, `lead`, and `ntile` (and others) can share common logic. As before, each batch is emitted as soon as its output column has been entirely filled. See cockroachdb#37035 Release note (sql change): the vectorized engine now supports the lag and lead window functions.
craig bot
pushed a commit
that referenced
this issue
May 28, 2021
65634: colexec: implement lag and lead window functions in vectorized engine r=DrewKimball a=DrewKimball `lag` and `lead` are window functions that return an expression evaluated at some row `offset` rows from the current row, defaulting to `default` if such a row does not exist in the partition. This patch provides vectorized implementations of the `lag` and `lead` window functions. The logic to buffer each partition into a queue is now separate from calculation of the window function itself so that `lag`, `lead`, and `ntile` (and others) can share common logic. As before, each batch is emitted as soon as its output column has been entirely filled. See #37035 Release note (sql change): the vectorized engine now supports the lag and lead window functions. Co-authored-by: Drew Kimball <[email protected]>
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 19, 2021
This patch adds vectorized implementations of the `first_value`, `last_value`, and `nth_value` window functions. These functions return an expression evaluated at the first, last, and nth positions respectively in the window frame for the current row. Fixes cockroachdb#37035 Release note (sql change): first_value, last_value, and nth_value window functions can now be executed in the vectorized engine. This allows for faster execution time, and also removes the need for conversions to and from row format.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 20, 2021
This patch adds vectorized implementations of the `first_value`, `last_value`, and `nth_value` window functions. These functions return an expression evaluated at the first, last, and nth positions respectively in the window frame for the current row. Fixes cockroachdb#37035 Release note (sql change): first_value, last_value, and nth_value window functions can now be executed in the vectorized engine. This allows for faster execution time, and also removes the need for conversions to and from row format.
craig bot
pushed a commit
that referenced
this issue
Jul 20, 2021
66002: sql: support range based lookup join spans r=cucaroach a=cucaroach sql: support range based lookup join spans Informs #51576 If filters exist on a lookup join that match columns that we are doing the lookup against add them to the lookupExpr in the join reader spec and build those filters into the multispan generator. If we have inequality conditions we need to be able to lookup of the prefix for keys found against range spans and not just point spans so build a sorted slice of span+inputRowIndices we can binary search on. Issue #51576 also encompasses allowing inequalities on columns from the index to reference columns from the input, that will come in a later commit. Release note (sql change): Improve performance of lookup joins in some cases. If join inequality conditions can be matched to index columns include the conditions in the index lookup spans and remove them from the runtime filters. 67747: vendor: bump Pebble to 8c3bd06debb5 r=bananabrick a=jbowens ``` 8c3bd06d vfs: export ErrUnsupported b8b89851 tool: add manifest summarize command 8ac9797e internal/manifest: Move Atomic field in the FileMetaData struct to the top. b8f537d8 compaction: remove nonzero-seqnum splitting logic 5c39ff26 benchmarks: infrastructure to set up file system benchmarks ``` Release note: None 67764: colexecwindow: implement first_value, last_value, and nth_value r=DrewKimball a=DrewKimball **logictest: add testfile to ensure that window functions are vectorized** This commit adds a logic testfile that executes the window functions currently supported by the vectorized engine with `SET vectorize = experimental_always`. This tests that supported window functions can actually be executed in the vectorized engine. Release note: None **colexecwindow: add first_value, last_value, and nth_value** This patch adds vectorized implementations of the `first_value`, `last_value`, and `nth_value` window functions. These functions return an expression evaluated at the first, last, and nth positions respectively in the window frame for the current row. Release note (sql change): first_value, last_value, and nth_value window functions can now be executed in the vectorized engine. This allows for faster execution time, and also removes the need for conversions to and from row format. **colexecwindow: test window framer with different memory limits** Previously, the `TestWindowFramer` used `math.MaxInt64` as the memory limit for the `SpillingBuffer` used to buffer all tuples in a partition. This patch modifies the test to choose a random memory limit and fixes a bug that could cause `windowFramer` operators to skip a peer group when seeking to frame start and end indexes in `RANGE` or `GROUPS` mode. Release note: None **colexec: add range check for binary operations with timestamps** This commit adds a range check after additions or subtractions that produce a timestamp result. This ensures that the vectorized engine produces the same error as the row engine when the result of the operation is out of range. Release note: None Fixes #37035 67823: cloud: bump orchestrator version to 21.1.6 r=j-low a=j-low Release note: None 67830: roachtest: bump predecessor to v21.1.6 r=j-low a=j-low Release note: None Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: Joseph Lowinske <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-vec
SQL vectorized engine
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Add support for:
The text was updated successfully, but these errors were encountered: