-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Esql/lookup join grammar #116515
Esql/lookup join grammar #116515
Conversation
rename existing lookup command to look_up (to avoid clashes)
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor questions.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/MapperUtils.java
Show resolved
Hide resolved
Remove USING syntax - use ON <field> for now Fix incorrect serialization
fae8354
to
7f14d61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a first pass with some questions and thoughts.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup.csv-spec-ignored
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/UnaryScalarFunction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java
Outdated
Show resolved
Hide resolved
} | ||
for (Attribute a : right) { | ||
// override the existing entry in place | ||
nameToAttribute.compute(a.name(), (name, existing) -> a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess we should make a
nullable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO plus entry in the meta ticket - making the attributes nullable changes their equality which has side effects in the plan.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/plan/logical/JoinSerializationTests.java
Outdated
Show resolved
Hide resolved
c66ccce
to
9b51cef
Compare
Move writable declarations outside the core classes to avoid errors (such as subClass.getNamedWritable()) and centralize them in a top package class for better management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, all done now. I think we're not too far away, but I think there's a couple things that needs to be done before merging - and a bunch of things that can be done after.
Here's what I think we should do:
- I believe the refactoring of the named writable entries should become its own PR. Currently, our mixed cluster test suite is muted, which means that making such a large scale refactor to our serialization is quite risky.
- It looks to me like our current planning for lookup joins expects the match fields from the left and right hand side's outputs to have the same name ids. I believe this is wrong and will lead to inconsistent planning; see my comment on the changes to
Analyzer.java
. The fields in the lookup index are different from fields that we join on. Actually, even if we join the lookup index with itself, we should probably still assign different name ids to the respective field attrs. to avoid bugs. There's a comment inLookupJoinExec
that claims we need qualifiers to make this work, but I don't believe so - name ids should disambiguate the attributes just fine. LookupJoin
andLookupJoinExec
have some inconsistencies and hacks which I think we best address to avoid them from lingering and producing secondary issues. E.g. the fact thatLookupJoinExec
doesn't have any right hand side attributes in itsreferences()
, the fact thatLookupJoin.info()
is not quite right etc. See the individual comments.- Test coverage (can be addressed in a follow up, but then we need to put it in the meta issue as well):
- The test for lookup joins on the data nodes doesn't look right to me. But we can consider lookup joins on data nodes a follow-up.
- In general, the test coverage in terms of csv tests for lookup joins even on the coordinator is too small. We can address this in a follow-up, though, but at least should add it to the meta issue.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java
Show resolved
Hide resolved
// Do not just add the JoinConfig as a whole - this would prevent correctly registering the | ||
// expressions and references. | ||
return NodeInfo.create(this, LookupJoin::new, left(), right(), config(), output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a TODO
? The JoinConfig is currently still added as a whole, which means that methods like QueryPlan.forEachExpression
will not work as expected.
Maybe this can just be fixed by copying the approach from Join
? There, I solved this by having a constructor that takes the components of the JoinConfig
instead of the whole thing, which is also the longest constructor to satisfy the info()
contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was no longer relevant and was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still have to pass the individual components of the join config to the node info instead of the join config object as a whole.
If we want to, say, rewrite the plan and need to replace an attribute in the join config's left hand side, just calling forEachExpression
on the join will be wrong because it ignores the join config.
We had issues because of this with Join
before; let's fix it so we don't run into trouble later.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java
Outdated
Show resolved
Hide resolved
if (right instanceof FragmentExec fragment | ||
&& fragment.fragment() instanceof EsRelation relation | ||
&& relation.indexMode() == IndexMode.LOOKUP) { | ||
return new LookupJoinExec( | ||
join.source(), | ||
left, | ||
right, | ||
config.matchFields(), | ||
config.leftFields(), | ||
config.rightFields(), | ||
join.output() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I understand that our plan for the future is to have more general joins where we might push e.g. a filter into the right hand side and then won't perform a LookupJoinExec
but a more general join here. But as that's currently not in place, I believe this approach is complicating the planning for no gain, because we could also map LookupJoin
directly to LookupJoinExec
without first transforming it into a general Join
. I believe this increases the risk of mistakes due to premature abstraction and loses information during logical optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree right now the abstraction is simplistic but the point is to decouple the command from the execution.
LookupJoin is syntactic sugar for special left join with special output - it gets folded into a regular join since I'd like to avoid serializing it (and dealing with versioning) and to have the compute engine pick up the pattern not the class.
That means any syntax that gets converted into a left join could be executed against a lookup join. Enrich for example could be attached onto this one but the dependency between the logical and physical happens through the pattern NOT the Lookup class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair - but we aren't at the point yet where an Enrich, or anything else for that matter, is rewritten into a pattern that is translated into a LookupJoinExec. Currently, we only have the Lookup logical plan node, which doesn't need the additional round trip via surrogate + pattern matching.
I really prefer deferring such generalizations until they have actual use cases - because getting the generalization wrong early on leads to pain :/ and it makes the code more complex than it currently needs.
But that's supposed to be a nit anyway :)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExec.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExec.java
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Show resolved
Hide resolved
Hi @costin, I've created a changelog YAML for you. |
Thanks @alex-spies for all the comments, see my nested replies. This PR doesn't aim to fully address the lookup join, rather add the basic layers that can later be distributed and worked on as a follow-up from #116208 in which you already commented (thanks for that!). |
Pinging @elastic/kibana-esql (ES|QL-ui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya - approving this to unblock the PR. I think it's fine to address most things in follow ups, as you suggested @costin.
Before merging, I think it'd be great to
- merge ESQL: Refactor named writeable entries for expressions and plans #117029 first so that the serialization refactoring can be easily backported,
- carefully consider if this should be backported as well (not doing so can seriously hamper our ability to auto-backport as this PR touches many places, even when factoring out the serialization changes), and
- fix
LookupJoin.info()
as commented before.
0c30cb9
to
ac9427e
Compare
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query ; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <[email protected]> (cherry picked from commit bc785f5)
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query ; follow-ups are required (see #116208 for more details). Co-authored-by: Nik Everett <[email protected]> (cherry picked from commit bc785f5)
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <[email protected]>
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <[email protected]>
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details).