Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: implement vectorized version of NotExpr #70713

Closed
yuzefovich opened this issue Sep 24, 2021 · 14 comments · Fixed by #75058
Closed

colexec: implement vectorized version of NotExpr #70713

yuzefovich opened this issue Sep 24, 2021 · 14 comments · Fixed by #75058
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. good first issue T-sql-queries SQL Queries Team

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Sep 24, 2021

Currently, tree.NotExpr is not implemented natively in the vectorized engine, so we have to fallback to the older row-by-row engine to evaluate it. We should, instead, vectorize NotExpr.

I think the implementation will be quite similar to how tree.IsNullExpr is implemented, and I think we will want to implement two versions of NotExpr operator:

  • one for projections, which populates a coldata.Vec with values equal to evaluating NOT predicate on each row in another coldata.Vec (while paying attention to nulls in the input vector and the selection vector on the input batch)
  • one for selections, which updates the selection vector on the batch to include only rows for which NOT predicate evaluated to true.

Some code pointers:

  • the planning is done in colbuilder/execplan.go in planProjectionOperators and planSelectionOperators
  • tree.NotExpr lives in sem/tree/expr.go
  • the vectorized implementation of tree.IsNullExpr lives in colexec/is_null_ops.eg.go (which is automatically generated based on colexec/is_nulls_ops_tmpl.go and colexec/execgen/cmd/execgen/is_nulls_ops_gen.go).

Please let me know if more guidance is needed.

Jira issue: CRDB-10187

@yuzefovich yuzefovich added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. good first issue labels Sep 24, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 24, 2021
@amitech
Copy link

amitech commented Sep 27, 2021

@yuzefovich I am new to cockroachdb, but I would like to contribute for this issue. Please assign it to me.

@yuzefovich
Copy link
Member Author

Thanks for your interest @amitech! Let me know if you need some guidance / more pointers.

@amitech
Copy link

amitech commented Sep 28, 2021

Sure, thanks @yuzefovich.

@jlevesy
Copy link
Contributor

jlevesy commented Oct 20, 2021

Hello, is this issue being actively worked on? I would be happy to give it a try if it's OK. Thanks!

@yuzefovich
Copy link
Member Author

@jlevesy thanks for your interest! @amitech has volunteered to work on this issue, so I want to give them more time. However, there is another issue of similar complexity that you might be interested in #66015 (it's currently assigned to my colleague, but I'm sure he'll be happy to give it away :) ).

@jlevesy
Copy link
Contributor

jlevesy commented Oct 20, 2021

Awesome, thanks! :)

@RajivTS
Copy link
Contributor

RajivTS commented Jan 5, 2022

Hi @yuzefovich , is this issue still under progress? I can see that it was assigned back in Sept 2021 and is still open. I would like to work on it if its alright to do so.

I have gone through the suggestions that you mentioned and am able to form a rough outline of the changes that might be required.

@yuzefovich
Copy link
Member Author

@RajivTS thanks for your interest! Indeed, I haven't seen much progress from @amitech, so feel free to work on this issue. I'll assign it to you.

@yuzefovich yuzefovich assigned RajivTS and unassigned amitech Jan 5, 2022
@RajivTS
Copy link
Contributor

RajivTS commented Jan 6, 2022

Thanks @yuzefovich!

Before I proceed, I wanted to validate my understanding and clear a few doubts. Given your initial explanation, I think the task involves:

  • Defining new operators NotExprProjOp for projection and NotExprSelOp for selection when evaluating the result of a NotExpr
  • Adding a new case for tree.NotExpr in both planProjectionOperators and planSelectionOperators where the TypedInnerExpr is evaluated/resolved by NotExprProjOp operator for projection and NotExprSel for selection
  • Defining functions NewNotExprProjOp and NewNotExprSelOp that allows for some data prep (e.g. adding a new typed column vector to contain the result in case of projection) before creating the associated operator
  • Defining the Next method for both the projection and selection operators. The Next method evaluates the next batch of input for the given operator where input is captured as column vector from the batch at index colIdx. For projection, the result is reflected in the new column vector added for the output at outputIdx while for selection the output is reflected in the selection slice obtained from the input batch
  • Adding relevant tests where required using IsNullExpr as reference

In case there are potential gaps in my understanding outlined above (likely at this point :) ), please do help me in rectifying them. Assuming the above explanation is correct, I had the following doubts pertaining to the implementation:

  • The Next method for IsNullProjOp and IsNullSelOp coverts the input column vector at colIdx into Nulls which then allows for simple index-based checking of values being null using the NullAt method. Currently, I am unable to determine an equivalent logic for NotExpr, i.e. how do I check if the NOT of the input expression evaluates as true for a value at index i in the input column vector?
  • The implementation of the Next method for IsNullExpr resides in a template file which is then used to auto-generate the final code. The primary intent here seems to be code-reuse between the IsNull and IsTupleNull variants which is achieved by using a single method. Would we need to consider the Tuple variant when working on NotExpr? If yes, would the NotExpr need to be evaluated for every element and output true if all are true else false? If we do not need to consider tuples, then does the implementation need to reside in the template file?
  • The IsNull implementation uses a negate flag to provide support for both IsNullExpr and IsNotNullExpr. Is it safe to assume that we do need anything similar in this case?

@yuzefovich
Copy link
Member Author

@RajivTS I think you're understanding of the task is right on the point!

I'll address each of the questions/doubts below. Let me know if you have other questions.

The Next method for IsNullProjOp and IsNullSelOp coverts the input column vector at colIdx into Nulls which then allows for simple index-based checking of values being null using the NullAt method. Currently, I am unable to determine an equivalent logic for NotExpr, i.e. how do I check if the NOT of the input expression evaluates as true for a value at index i in the input column vector?

Let's think through what IS NULL expr and NOT expr mean.

IS NULL checks whether expr is NULL or not and evaluates to the corresponding boolean, so in order to compute the output vector we only have to extract Nulls bitmap from the input vector (and the input vector can be of any type - INT, FLOAT, etc).

NOT expr negates the non-NULL boolean expression, and if the expression is NULL, then the result of NOT NULL is also NULL. So we have the following logic: for each element of the input vector we first check whether it is NULL or not; if it is, then the output for that element is also NULL, so we mark the Nulls bitmap at the index accordingly; if the element is not NULL, then we have to look at the actual input element value (which must be a boolean), and then we negate that value.

More concretely, you'll have something like this:

  inputVec := batch.ColVec(inputIdx)
  outputVec := batch.ColVec(outputIdx)
  if inputVec.Nulls().NullAt(i) {
    outputVec.Nulls().SetNullAt(i)
  } else {
    val := inputVec.Bool().Get(i)
    outputVec.Bool().Set(i, !val)
  }

Would we need to consider the Tuple variant when working on NotExpr?

No, that thing is specific to IS NULL. I mentioned is_null as an example for the general layout of things. I think for NOT expr we don't need to use the templates, so you can just write regular Go code.

The IsNull implementation uses a negate flag to provide support for both IsNullExpr and IsNotNullExpr. Is it safe to assume that we do need anything similar in this case?

We don't need to do anything like this.

@RajivTS
Copy link
Contributor

RajivTS commented Jan 7, 2022

Thank you for the detailed explanation @yuzefovich!

I think I have sufficient grasp of the requirement at this point to begin the implementation and will let you know as soon as I make some progress. In the meantime, I just wanted to get confirmation on the intended behavior for select:

  • For select, I assume that we need to select only those values from the input where the expression evaluates to False (i.e. NOT of the expression evaluates to True).
  • If the input value is NULL in case of select, then I assume these values must not be selected.

@yuzefovich
Copy link
Member Author

Yes, that sounds right. Selection operators only include values for which the expression evaluates to true, and NOT expr is true iff expr is false. If expr is NULL, then NOT expr is also NULL, so it should not be included.

@RajivTS
Copy link
Contributor

RajivTS commented Jan 8, 2022

Hi @yuzefovich

I have made the initial set of changes based on my understanding and have raised a draft PR in my own fork. Please review it and let me know if any changes need to be made. The PR also contains a commented-out test case which was failing despite no faults in the underlying implementation.
FailingTest

I believe the InjectNull variant of test result verification needs to have an opt-out flag which allows to bypass the check in cases such as this. Please do let me know if I am missing something here or if this test verification logic needs to be updated.

@yuzefovich
Copy link
Member Author

Thanks for working on this! In order for me to review your code, please open up the PR against master branch of cockroachdb/cockroach. The guidelines are here, but the quick summary is that you push to a branch in your fork and then open the PR against cockroachdb/cockroach. (On a quick glance everything seems great, I'll probably have some nit comments, but likely nothing major.)

Re: InjectNulls - I'm guessing that allNullsInjection is the one that is failing, so you'll want to use RunTestsWithoutAllNullsInjection instead of RunTests for that test case. Take a look at and_or_projection_test.go where for some test cases allNullsInjection is also skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. good first issue T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants
@jlinder @jlevesy @yuzefovich @amitech @RajivTS and others