-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Implement IN(value1, value2, ...) expression. #34581
Conversation
Implement the functionality to translate the `field IN (value1, value2,...) expressions to proper Lucene queries or painless script depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: elastic#32955
Pinging @elastic/es-search-aggs |
I tested this a bit and a combination of a function and |
@matriv this one fails and I think it shouldn't:
|
The above comparison is fixed in #34573 (the underlying null-safe equality does proper widening when comparing |
Thanks for catching that. Added validation and tests for nice error message. |
Strange that the one above fails, but the next one doesn't: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I've left a number of comments but most of them are stylistic.
I wonder if there's an optimization rule that we can use to removes from the list the items that are known to not match in order to minimize the list and thus the number of pipes, etc.. that follows.
That would only work though if the value (the left as you say) is constant and thus all not matching constants from the list could be removed:
`SELECT 1 IN (2,3, foo) FROM TABLE;
if (this == other) { | ||
return true; | ||
} else if (this.isString() && other.isString()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to specify this
: if (isString() && other.isString())
@@ -188,6 +193,16 @@ private static Failure fail(Node<?> source, String message, Object... args) { | |||
|
|||
Set<Failure> localFailures = new LinkedHashSet<>(); | |||
|
|||
if (p instanceof Filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To increase readability please move the two if
s to a separate method (checkInExpression
?) similar to checkGroupBy
.
In in = (In) e; | ||
DataType dt = null; | ||
for (Expression rightValue : in.list()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rightValue
is a bit confusing since it also means the correct value. How about inValue
or just value
?
DataType dt = null; | ||
for (Expression rightValue : in.list()) { | ||
if (dt == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding the rule but I think this can be simplified by initializing dt
with a first value and avoid the double if
s (which care very similar).
If the rule check the the values between themselves, this can be done by picking the first item and then comparing it with the rest (through an index (a bit verbose but fast) or a sublist).
If the rule checks the value against the list (which includes the former but can't be as precise in the message if the former occurs) dt is initialized to the In.value()
and then iterates through the list.
|
||
private static void validateInExpression(Expression e, Set<Failure> localFailures) { | ||
if (e instanceof In) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the instanceof
check using plan.forEachExpression(method, In.class)
so for Filter
you can do
filterPlan.condition().forEachExpression(validateIn, In.class)
validateIn(In in) { ...}
the Set<Failure>
can be passed to the closure directly - see checkGroupBy & co
for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thx for the suggestion.
for (Pipe p : right) { | ||
newRight.add(p.resolveAttributes(resolver)); | ||
} | ||
return replaceChildren(Stream.concat(Stream.of(newLeft), newRight.stream()).collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if there's only one list no need to separate, concatenate things.
|
||
@Override | ||
public boolean resolved() { | ||
return left().resolved() && right().stream().allMatch(Pipe::resolved); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolvables.resolved(list)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm cannot use that. Pipe
is not Resolvable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing Pipe
to implement Resolvable
|
||
public class InPipe extends Pipe { | ||
|
||
private Pipe left; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think left and right are a bit confusing - why not use value and list?
// if the code gets here it's a bug | ||
// | ||
else { | ||
throw new UnsupportedOperationException("No idea how to translate " + in.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SqlIllegalArgumentException
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy pasted from here: https://github.com/elastic/elasticsearch/pull/34581/files/ab7c1502d1b6341b2b3cc3be7eda72b00e606ada#diff-5d7529d13a2ef47d436ea2aa577e0c52R543 :-( Fixing both!
private Analyzer analyzer; | ||
|
||
public QueryTranslatorTests() { | ||
public class QueryTranslatorTests extends AbstractBuilderTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AbstractBuilderTestCase
used anywhere (maybe I'm missing it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it for the shardContext here: https://github.com/elastic/elasticsearch/pull/34581/files/ab7c1502d1b6341b2b3cc3be7eda72b00e606ada#diff-aef2b0ce456b8fdd5cc09d6cfd55f0c2R173.
I tried to manually mock it but ended up with ugly code. The AbstractBuilderTestCase is parent class for many tests in .search.aggregations
and .index.query
packages.
By the way, this should go in 6.5 as well. |
The setting that reduces the disk space requirement for the forecasting integration tests was accidentally removed in elastic#31757 when files were moved around. This change simply adds back the setting that existed before that.
Applies our standard column wrapping to the `discovery-ec2` and `repository-s3` plugins.
Changes wording in the FIPS 140-2 related documentation. Co-authored-by: derickson <[email protected]>
Adds support for query-time formatting of the date histo keys when executing a rollup search. Closes elastic#34391
ab7c150
to
4cecd50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I've added some minor comments regarding styling.
Also, it's worth adding two unit tests : one to check the optimizer does folding of the in expressions (see optimizer tests - something like 1 in (2-1, 2, 3)
, it should return true and another to see whether In
removes duplicates 1 in (1,2,3,1,2,3,1,2,3)
which is handled by passing the list through an insertion-order set. From there on it can be treated as a list, knowing there are no duplicates.
@@ -144,7 +150,7 @@ | |||
} | |||
|
|||
static QueryTranslation toQuery(Expression e, boolean onAggs) { | |||
QueryTranslation translation = null; | |||
QueryTranslation translation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a formatting issue - I'd keep the explicit initialization (to keep introspection tools at bay that yes, null
is expected).
public TermsQuery(Location location, String term, List<Expression> values) { | ||
super(location); | ||
this.term = term; | ||
this.values = values.stream().map(Expression::fold).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the collector could be a set (as oppose to a list) to remove duplicates.
/** | ||
* Comparison utilities. | ||
*/ | ||
abstract class Comparisons { | ||
public final class Comparisons { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick, I tend to explicitly use valueOf
/xxxValue
to clarify the use of boxing.
} | ||
|
||
@Override | ||
public boolean foldable() { | ||
return foldable; | ||
return children().stream().allMatch(Expression::foldable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved to Expressions#foldable
similar to nullable
and resolvable
.
@costin Addressed comments. Please take another look. If the whole Making the |
public TermsQuery(Location location, String term, List<Expression> values) { | ||
super(location); | ||
this.term = term; | ||
this.values = values.stream().map(Expression::fold).collect(Collectors.toCollection(LinkedHashSet::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, use Foldables.*
: new LinkedHashSet(Foldables.valuesOf(values, datatType()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
The optimization rule is not critical since the end result is the same though I'm curious why it doesn't kick in.
21dac9d
to
d1e1018
Compare
@costin as discussed, I was wrong the optimisation kicks in, it was just not tested properly, I now have a test for that. Thank you! |
retest this please |
Implement the functionality to translate the `field IN (value1, value2,...)` expressions to proper Lucene queries or painless script or local processors depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: #32955
Backported to |
* master: (24 commits) ingest: better support for conditionals with simulate?verbose (elastic#34155) [Rollup] Job deletion should be invoked on the allocated task (elastic#34574) [DOCS] .Security index is never auto created (elastic#34589) CCR: Requires soft-deletes on the follower (elastic#34725) re-enable bwc tests (elastic#34743) Empty GetAliases authorization fix (elastic#34444) INGEST: Document Processor Conditional (elastic#33388) [CCR] Add total fetch time leader stat (elastic#34577) SQL: Support pattern against compatible indices (elastic#34718) [CCR] Auto follow pattern APIs adjustments (elastic#34518) [Test] Remove dead code from ExceptionSerializationTests (elastic#34713) A small typo in migration-assistance doc (elastic#34704) ingest: processor stats (elastic#34724) SQL: Implement IN(value1, value2, ...) expression. (elastic#34581) Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273) INGEST: Rename Pipeline Processor Param. (elastic#34733) Core: Move IndexNameExpressionResolver to java time (elastic#34507) [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882) TESTING.asciidoc fix examples using forbidden annotation (elastic#34515) SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660) ...
Implement the functionality to translate the `field IN (value1, value2,...)` expressions to proper Lucene queries or painless script or local processors depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: #32955
Implement the functionality to translate the
`field IN (value1, value2,...) expressions to proper Lucene queries
or painless script depending on the use case.
The
IN
expression can be used in SELECT, WHERE and HAVING clauses.Closes: #32955