Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: support all window functions in vectorized engine #37035

Closed
9 tasks done
yuzefovich opened this issue Apr 23, 2019 · 0 comments · Fixed by #67764
Closed
9 tasks done

exec: support all window functions in vectorized engine #37035

yuzefovich opened this issue Apr 23, 2019 · 0 comments · Fixed by #67764
Assignees
Labels
A-sql-vec SQL vectorized engine C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Apr 23, 2019

Add support for:

@yuzefovich yuzefovich added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-execution Relating to SQL execution. labels Apr 23, 2019
@yuzefovich yuzefovich self-assigned this Apr 23, 2019
@yuzefovich yuzefovich added A-sql-vec SQL vectorized engine and removed A-sql-execution Relating to SQL execution. labels Apr 23, 2019
craig bot pushed a commit that referenced this issue May 1, 2019
36926: exec: add a few window function operators in simple cases r=yuzefovich a=yuzefovich

Adds support for ROW_NUMBER, RANK, and DENSE_RANK window functions
when there is a single window function in the query with no
PARTITION BY clause and with no window frame.

Addresses: #37035.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot pushed a commit that referenced this issue Mar 3, 2020
42137: colexec: add support for percent_rank and cume_dist window functions r=yuzefovich a=yuzefovich

**colexec: add window peer grouper**

This commit adds an operator that populates a boolean column to signify
whether the corresponding tuple is the start of a new peer group. Peers
are such tuples that belong to the same partition and are equal on the
ordering columns. Some window functions must return the same output for
all peers in the peer group. Currently this operator is not used.

Release note: None

**colexec: minor refactor of rank and denseRank operators**

Previously, rank and denseRank operators contained the logic to figure
out the boundaries of the peer groups, but now that we have a window
peer grouper, it is no longer necessary which simplified the code a
little bit.

Release note: None

**colexec: add support for PERCENT_RANK window function**

This commit adds the support for PERCENT_RANK window function. This
function differs from two other rank variances in that it needs to know
the number of tuples in the partition. If there is no PARTITION BY
clause, then we have no other choice but to buffer the input fully. If
PARTITION BY clause is present, we need to buffer all tuples that belong
to each partition before we can populate the output. However, for
simplicity, the current implementation of the operator with PARTITION BY
clause also fully buffers the whole input before emitting any output.

This commit also adds a couple of "vec-on" configs to 'window' logic
test. This will increase the test coverage of window functions supported
by the vectorized engine.

Addresses: #37035.

Release note: None

**colexec: add support for cume_dist window function**

This commit adds support for CUME_DIST window function. This function is
quite similar to PERCENT_RANK, so it reuses the same template.
"percent_rank" things have been renamed to "relative_rank" things.

This commit also enables running of `row_number`, `rank`, and
`dense_rank` with `vectorize=auto`. The reason is that these window
functions are streaming and internally they might a sorter which can
fallback to disk if necessary.

Addresses: #37035.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@yuzefovich yuzefovich removed their assignment Mar 14, 2020
DrewKimball pushed a commit to DrewKimball/cockroach that referenced this issue May 18, 2021
This patch implements the `ntile` window function in the vectorized
engine. `ntile` takes in an integer argument `num_buckets` and then
distributes all rows in a partition equally between the buckets,
outputting the bucket number for each row.

In the vectorized implementation, batches are buffered until the end
of a partition is reached, at which point the `ntile` bucket values can
be calculated. The batches are emitted in a streaming fashion; as soon
as a batch is fully processed, it is returned and work is paused until
the next call to `Next()`.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the ntile
window function.
craig bot pushed a commit that referenced this issue May 18, 2021
64977: colexec: implement vectorized ntile window function r=DrewKimball a=DrewKimball

This patch implements the `ntile` window function in the vectorized
engine. `ntile` takes in an integer argument `num_buckets` and then
distributes all rows in a partition equally between the buckets,
outputting the bucket number for each row.

In the vectorized implementation, batches are buffered until the end
of a partition is reached, at which point the `ntile` bucket values can
be calculated. The batches are emitted in a streaming fashion; as soon
as a batch is fully processed, it is returned and work is paused until
the next call to `Next()`.

See #37035

Release note (sql change): the vectorized engine now supports the ntile window function.

65322: jobs: surface the trace_id linked to job execution r=adityamaru a=adityamaru

Please see individual commit messages.

Informs: #64992

65415: authors: add annezhu98 to authors r=annezhu98 a=annezhu98

Release note: None

65418: authors: add <IsaDash> to authors r=IsaDash a=IsaDash

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Anne Zhu <[email protected]>
Co-authored-by: Isa Dash <[email protected]>
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue May 24, 2021
`lag` and `lead` are window functions that return an expression
evaluated at some row `offset` rows from the current row, defaulting
to `default` if such a row does not exist in the partition.

This patch provides vectorized implementations of the `lag` and `lead`
window functions. The logic to buffer each partition into a queue is
now separate from calculation of the window function itself so that
`lag`, `lead`, and `ntile` (and others) can share common logic. As
before, each batch is emitted as soon as its output column has been
entirely filled.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the lag
and lead window functions.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue May 26, 2021
`lag` and `lead` are window functions that return an expression
evaluated at some row `offset` rows from the current row, defaulting
to `default` if such a row does not exist in the partition.

This patch provides vectorized implementations of the `lag` and `lead`
window functions. The logic to buffer each partition into a queue is
now separate from calculation of the window function itself so that
`lag`, `lead`, and `ntile` (and others) can share common logic. As
before, each batch is emitted as soon as its output column has been
entirely filled.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the lag
and lead window functions.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue May 27, 2021
`lag` and `lead` are window functions that return an expression
evaluated at some row `offset` rows from the current row, defaulting
to `default` if such a row does not exist in the partition.

This patch provides vectorized implementations of the `lag` and `lead`
window functions. The logic to buffer each partition into a queue is
now separate from calculation of the window function itself so that
`lag`, `lead`, and `ntile` (and others) can share common logic. As
before, each batch is emitted as soon as its output column has been
entirely filled.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the lag
and lead window functions.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue May 27, 2021
`lag` and `lead` are window functions that return an expression
evaluated at some row `offset` rows from the current row, defaulting
to `default` if such a row does not exist in the partition.

This patch provides vectorized implementations of the `lag` and `lead`
window functions. The logic to buffer each partition into a queue is
now separate from calculation of the window function itself so that
`lag`, `lead`, and `ntile` (and others) can share common logic. As
before, each batch is emitted as soon as its output column has been
entirely filled.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the lag
and lead window functions.
craig bot pushed a commit that referenced this issue May 28, 2021
65634: colexec: implement lag and lead window functions in vectorized engine r=DrewKimball a=DrewKimball

`lag` and `lead` are window functions that return an expression
evaluated at some row `offset` rows from the current row, defaulting
to `default` if such a row does not exist in the partition.

This patch provides vectorized implementations of the `lag` and `lead`
window functions. The logic to buffer each partition into a queue is
now separate from calculation of the window function itself so that
`lag`, `lead`, and `ntile` (and others) can share common logic. As
before, each batch is emitted as soon as its output column has been
entirely filled.

See #37035

Release note (sql change): the vectorized engine now supports the lag
and lead window functions.

Co-authored-by: Drew Kimball <[email protected]>
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jul 19, 2021
This patch adds vectorized implementations of the `first_value`,
`last_value`, and `nth_value` window functions. These functions
return an expression evaluated at the first, last, and nth positions
respectively in the window frame for the current row.

Fixes cockroachdb#37035

Release note (sql change): first_value, last_value, and nth_value
window functions can now be executed in the vectorized engine. This
allows for faster execution time, and also removes the need for
conversions to and from row format.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jul 20, 2021
This patch adds vectorized implementations of the `first_value`,
`last_value`, and `nth_value` window functions. These functions
return an expression evaluated at the first, last, and nth positions
respectively in the window frame for the current row.

Fixes cockroachdb#37035

Release note (sql change): first_value, last_value, and nth_value
window functions can now be executed in the vectorized engine. This
allows for faster execution time, and also removes the need for
conversions to and from row format.
craig bot pushed a commit that referenced this issue Jul 20, 2021
66002: sql: support range based lookup join spans r=cucaroach a=cucaroach

sql: support range based lookup join spans

Informs #51576

If filters exist on a lookup join that match columns that we are doing
the lookup against add them to the lookupExpr in the join reader spec
and build those filters into the multispan generator.

If we have inequality conditions we need to be able to lookup of the
prefix for keys found against range spans and not just point spans so
build a sorted slice of span+inputRowIndices we can binary search on.

Issue #51576 also encompasses allowing inequalities on columns from the
index to reference columns from the input, that will come in a later
commit.

Release note (sql change): Improve performance of lookup joins in some
cases. If join inequality conditions can be matched to index columns
include the conditions in the index lookup spans and remove them from
the runtime filters.

67747: vendor: bump Pebble to 8c3bd06debb5 r=bananabrick a=jbowens

```
8c3bd06d vfs: export ErrUnsupported
b8b89851 tool: add manifest summarize command
8ac9797e internal/manifest: Move Atomic field in the FileMetaData struct to the top.
b8f537d8 compaction: remove nonzero-seqnum splitting logic
5c39ff26 benchmarks: infrastructure to set up file system benchmarks
```

Release note: None

67764: colexecwindow: implement first_value, last_value, and nth_value r=DrewKimball a=DrewKimball

**logictest: add testfile to ensure that window functions are vectorized**

This commit adds a logic testfile that executes the window functions
currently supported by the vectorized engine with
`SET vectorize = experimental_always`. This tests that supported
window functions can actually be executed in the vectorized engine.

Release note: None

**colexecwindow: add first_value, last_value, and nth_value**

This patch adds vectorized implementations of the `first_value`,
`last_value`, and `nth_value` window functions. These functions
return an expression evaluated at the first, last, and nth positions
respectively in the window frame for the current row.

Release note (sql change): first_value, last_value, and nth_value
window functions can now be executed in the vectorized engine. This
allows for faster execution time, and also removes the need for
conversions to and from row format.

**colexecwindow: test window framer with different memory limits**

Previously, the `TestWindowFramer` used `math.MaxInt64` as the memory
limit for the `SpillingBuffer` used to buffer all tuples in a partition.
This patch modifies the test to choose a random memory limit and fixes
a bug that could cause `windowFramer` operators to skip a peer group
when seeking to frame start and end indexes in `RANGE` or `GROUPS` mode.

Release note: None

**colexec: add range check for binary operations with timestamps**

This commit adds a range check after additions or subtractions that
produce a timestamp result. This ensures that the vectorized engine
produces the same error as the row engine when the result of the
operation is out of range.

Release note: None

Fixes #37035

67823: cloud: bump orchestrator version to 21.1.6 r=j-low a=j-low

Release note: None

67830: roachtest: bump predecessor to v21.1.6 r=j-low a=j-low

Release note: None

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: Joseph Lowinske <[email protected]>
@craig craig bot closed this as completed in 9268d3c Jul 20, 2021
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-vec SQL vectorized engine C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants