-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESQL] Binary Comparison Serialization #107921
[ESQL] Binary Comparison Serialization #107921
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
} | ||
} | ||
throw new IOException("No BinaryComparisonOperation found for id [" + id + "]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more normal way to serialize enums and let's us implement wire compatibility more easily. It's annoying to type it out for every single enum
, but through hard won experience, we found it to be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of having individual almost identical readFromStream
/writeTo
methods for every enum, as opposed to the current, centralized (in StreamInput/StreamOuput) approach?
I see the serialization tests make use of the transport version (maybe this already exists now in "old" serialization code, not sure) as a novelty, but can't we do the same in ESQL (outside StreamOutput/StreamInput) and, also, have a common Enum serialization/deserialization code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can explain. We used to serialize enums the way that QL does - via the ordinals. It means fairly simple things like changing the order of the enums can break serialization. Or inserting a new enum. And adding wire backwards compatibility because a problem. You see here that we have to skip a number for wire compatibility. That's easy enough with the id
member but requires an extra unused member. We broke enough enum serialization without realizing it that we decided we should be pedantic about it and make a method like this on each enum.
So, partly it's to remove the concerns around sorting enums or inserting new entries. And partly its to make wire backwards compatibility easier in the future. Well, in this case, now. And, partly, it's to make QLs enums look like the other well behaved serializable enums in Elasticsearch.
FWIW, I tend to use a switch
statement instead of the loop. But that's not a big deal either way. I'd have done it like this:
@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeByte(code);
}
public static Role readFrom(StreamInput in) throws IOException {
return switch (in.readByte()) {
case 0 -> DEFAULT;
case 1 -> INDEX_ONLY;
case 2 -> SEARCH_ONLY;
default -> throw new IllegalStateException("unknown role");
};
}
But it doesn't matter.
FWIW there are about 200 calls to StreamOutput#writeEnum
and StreamOutput#writeOptionalEnum
which does the ordinal thing. Spot checking enums in core yields a bunch that delegate to that and a bunch that do the switch
or loop way. I think its not as settled of a thing as I make it out to be. But it really makes me feel good knowing I don't have to worry about changing the order of the constants breaking things over the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nik9000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened another issue to talk about removing the enum serialization this way. There's some discussion around doing some other third thing instead. I dunno. I was more sure about this a few hours ago. I mean, I'm sure it's an improvement over what we have, but I'm not sure there aren't other good ways too. OTOH, better is better.
@@ -334,15 +331,15 @@ public void testBinComparisonSimple() throws IOException { | |||
var orig = new Equals(Source.EMPTY, field("foo", DataTypes.DOUBLE), field("bar", DataTypes.DOUBLE)); | |||
BytesStreamOutput bso = new BytesStreamOutput(); | |||
PlanStreamOutput out = new PlanStreamOutput(bso, planNameRegistry); | |||
out.writeNamed(BinaryComparison.class, orig); | |||
var deser = (Equals) planStreamInput(bso).readNamed(BinaryComparison.class); | |||
out.writeNamed(EsqlBinaryComparison.class, orig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause us trouble with the name? I know PlanNamedTypes
uses the class name sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PlanNameRegistry
only uses the plain class name, so as long as Equals extends BinaryComparison
is replaced by Equals extends EsqlBinaryComparison
- with the same name Equals
, there's a chance this works without issue.
Changing the "category class" EsqlBinaryComparison
under which we register Equals
could in theory break things, but in practice when we read/write expressions, the PlanNamedTypes
only cares that we have a subclass of Expression
(which is unchanged).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'd wait for a review by Costin/Andrei though since they're more familiar with the QL project.
BinaryComparisonOperation( | ||
int id, | ||
String symbol, | ||
BinaryComparisonProcessor.BinaryComparisonOperation shim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: maybe a short comment on why this is needed would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hoped it wouldn't live long enough to need such a comment, but you're probably right. I'll add one.
@@ -334,15 +331,15 @@ public void testBinComparisonSimple() throws IOException { | |||
var orig = new Equals(Source.EMPTY, field("foo", DataTypes.DOUBLE), field("bar", DataTypes.DOUBLE)); | |||
BytesStreamOutput bso = new BytesStreamOutput(); | |||
PlanStreamOutput out = new PlanStreamOutput(bso, planNameRegistry); | |||
out.writeNamed(BinaryComparison.class, orig); | |||
var deser = (Equals) planStreamInput(bso).readNamed(BinaryComparison.class); | |||
out.writeNamed(EsqlBinaryComparison.class, orig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PlanNameRegistry
only uses the plain class name, so as long as Equals extends BinaryComparison
is replaced by Equals extends EsqlBinaryComparison
- with the same name Equals
, there's a chance this works without issue.
Changing the "category class" EsqlBinaryComparison
under which we register Equals
could in theory break things, but in practice when we read/write expressions, the PlanNamedTypes
only cares that we have a subclass of Expression
(which is unchanged).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the disconnect from QL and I agree with it by creating our own EsqlBinaryComparison, but I fail to see the benefits of duplicating serialization logic for enums.
var left = in.readExpression(); | ||
var right = in.readExpression(); | ||
// TODO: Remove zoneId entirely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, isn't there the possibility of having date > '2024-01-01T01:00:00` on a specific timezone (see Kibana) sometime in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO is because it's a field we don't currently use or test; none of the actual ES|QL implementations actually use it, so it only looks like it does something, while actually doing nothing. I find that confusing, since I tend to not expect to carry around unused fields. In this particular case, you can see I am not even passing this value on to the constructor.
Beyond that, I don't see the reason that binary comparisons, and only binary comparisons, need to track the time zone they are operating in. If a date is time zone aware, that information should be encoded in the date (2024-04-25T09:15:00+06:00
or similar). Carrying the timezone as part of the operator makes that very confusing. What does it mean to say date > 2024-04-25T09:15:00+06:00 in CET
? And if that does have a meaning, why do we not also need it for (e.g.) subtraction of dates?
Ultimately, ES|QL doesn't have timezone support yet, and we don't know what that will look like. Maybe it will be the case that when building that, we decided we need to add back in a timezone here. But until we have a plan for that, I don't see why we should carry around a confusing, unused value.
} | ||
} | ||
throw new IOException("No BinaryComparisonOperation found for id [" + id + "]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of having individual almost identical readFromStream
/writeTo
methods for every enum, as opposed to the current, centralized (in StreamInput/StreamOuput) approach?
I see the serialization tests make use of the transport version (maybe this already exists now in "old" serialization code, not sure) as a novelty, but can't we do the same in ESQL (outside StreamOutput/StreamInput) and, also, have a common Enum serialization/deserialization code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
…omparison-serialization' into esql-binary-comparison-serialization
Prior to this PR, serializing a binary comparison in ES|QL depended on the enum
BinaryComparisonProcessor.BinaryComparisonOperator
from the QL binary comparison code. That put some distance between the ESQL classes and their serialization logic, while also limiting our ability to make adjustments to that logic (since doing so would have ramifications for SQL and EQL)This PR introduces a new ESQL specific enum for binary comparisons, which has a Writer and a Reader built in, and which implements the standard Writable interface. This enum is constructed in such a way as to be wire-compatible with the existing enum, thus not requiring a transport version change (although any future changes to this probably will require a transport version change).
A side effect of this change is removing Null Equals from ESQL serialization. We never actually implemented Null Equals, and the existing class is a stub. I infer that it was only created to allow use of the QL
BinaryComparisonOperator
enum, which specifies a Null Equals. I did not include it in the ESQL specific enum I just added, and as such removed it from places that reference that enum.There is also a "shim" mapping from the new ESQL specific enum to the general QL enum. This is necessary for passing up to the parent
BinaryOperation
class. Changing the argument for that to use an interface like ArithmeticOperation does would require some non-trivial changes to how QL does serialization, which would dramatically increase the surface area of this PR. Medium term, I would like to changeEsqlBinaryComparison
to inherit directly fromBinaryOperator
, which will remove the need for that shim. Unfortunately, doing so proved non-trivial, and so I'm saving that for follow up work.Follow up work:
- Remove remaining references to Null Equals, and the ESQL Null Equals class.
- Move
PlanNamedTypes.writeBinComparison
andPlanNamedTypes.readBinComparison
intoEsqlBinaryComparison
, and makeEsqlBinaryComparison
Writable
. This will finish putting the serialization logic next to the object being serialized, for binary comparisons.- Remove the "shim" by changing
EsqlBinaryComparison
to inherit directly fromBinaryOperation