-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] String literal implicit casting #106932
[ES|QL] String literal implicit casting #106932
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a small round of comments. This is shaping up nicely. Please check its impact on the rest of the code and look at removing to string conversion from various functions and see if casting works.
...ql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/date/DateTrunc.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Thanks for reviewing! One place that we do string conversation in an EsqlScalarFunction is AutoBucket, the 3rd and 4th inputs are converted from string to datetime inside the function, only if its 1st input is a datetime. If its 1st input is numeric, we don't convert. Since the conversion depends on another argument, we might just leave it inside the AutoBucket function. |
x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment is around still using string/keyword for AutoBucket - otherwise LGTM.
...l/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/math/AutoBucket.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments but, most importantly, this PR changes at least one current behavior into a not-ideal manner.
The following query from employees | EVAL k = hire_date > DATE_FORMAT(\"YYYY-MM-dd\", \"1989-06-02T00:00:00.000Z\") | keep languages, hire_date
(I didn't spend time in refining it) generates a user friendly error message in main
:
"type": "verification_exception",
"reason": "Found 1 problem\nline 1:39: second argument of [DATE_FORMAT(\"YYYY-MM-dd\", \"1989-06-02T00:00:00.000Z\")] must be [datetime], found value [\"1989-06-02T00:00:00.000Z\"] type [keyword]",
"stack_trace": "org.elasticsearch.xpack.esql.VerificationException: Found 1 problem\nline 1:39: second argument of [DATE_FORMAT(\"YYYY-MM-dd\", \"1989-06-02T00:00:00.000Z\")] must be [datetime], found value [\"1989-06-02T00:00:00.000Z\"] type [keyword]\r\n\tat org.elasticsearch.xpack.esql.analysis.Analyzer.verify(Analyzer.java:133)
while with this PR the error is not pretty
path: /_query, params: {format=txt, error_trace=true}, status: 500 java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.lucene.util.BytesRef (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.lucene.util.BytesRef is in module [email protected] of loader 'app')
at org.elasticsearch.xpack.esql.expression.function.scalar.date.DateFormat.toFormatter(DateFormat.java:114)
at org.elasticsearch.xpack.esql.expression.function.scalar.date.DateFormat.toEvaluator(DateFormat.java:106)
at org.elasticsearch.xpack.esql.evaluator.mapper.EvaluatorMapper.fold(EvaluatorMapper.java:48)
at org.elasticsearch.xpack.esql.expression.function.scalar.EsqlScalarFunction.fold(EsqlScalarFunction.java:30)
at org.elasticsearch.xpack.esql.analysis.Analyzer$PromoteStringsInDateComparisons.stringToDate(Analyzer.java:817)
at org.elasticsearch.xpack.esql.analysis.Analyzer$PromoteStringsInDateComparisons.promote(Analyzer.java:802)
at org.elasticsearch.xpack.ql.tree.Node.lambda$transformUp$11(Node.java:198)
at org.elasticsearch.xpack.ql.tree.Node.transformUp(Node.java:192)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
Thank you for catching this issue! After casting the second inputs to date_format to date with the ImplicitCasting rule, type resolution passed for date_format, and it is foldable now, however a new error is thrown from the PromoteStringsInDateComparisons rule. The left hand side hire_date is a date type, the output from date_format is a string type, this is why it errors out. Currently the auto-casting in ImplicitCasting and PromoteStringsInDateComparisons support string literal only, it does not cast the string output of scalar functions. The scalar functions will be evaluated/fold during logical plan optimizer, so it is too early to cast the output of date_format to date type during resolution phase. The query below errors out on main with the same error at the problem query above on the branch: java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.lucene.util.BytesRef (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.lucene.util.BytesRef is in unnamed module of loader 'app') I could add a check in PromoteStringsInDateComparisons to cast only literals, and skip functions. Perhaps I should add the same check in ImplicitCasting for nested functions. |
@fang-xing-esql Now I realized I didn't copy-pasted the right exception for what happens in this PR. Sorry about that and thank you for testing and confirming in your reply the behavior. I've updated my comment above. |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Outdated
Show resolved
Hide resolved
|
||
return types.stream() | ||
.min((dt1, dt2) -> dataTypeCastingPriority.get(dt1).compareTo(dataTypeCastingPriority.get(dt2))) | ||
.orElse(DataTypes.NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be NULL
- if you leave off orElse
then you'll get an Optional type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try if UNSUPPORTED makes sense.
...plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/function/FunctionRegistry.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
@@ -34,246 +34,6 @@ | |||
"variadic" : false, | |||
"returnType" : "datetime" | |||
}, | |||
{ | |||
"params" : [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewdaemon I've just realized you'll need this too. Here's the deal - we have automatic promotion of string literals. For this particular function these always had to be literals anyway so we no longer think of these as valid signatures.
This is an interesting wrinkle. We don't send anything telling you this has to be a literal. We probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 thanks for thinking of us! Yes, we currently have this information hard-coded on our end. If there's a way to surface this in the definitions y'all provide us... all the better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could generate something like:
{
"name" : "from",
"type" : "keyword",
"optional" : false,
"description" : "",
"only_literals": "date"
},
Or something like that. But in general if something says datetime
now it can take a literal string that is formatted like a date. Same for numbers. But that's quite a lot to add to your editor, especially right now.
There are other cases where we send you "only_literals": "any"
for bits that take any literal. The buckets
parameter here only takes literal integer
s. No fields allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of thing would be great. I have put elastic/kibana#179634 a bit on hold while I squish editor bugs for the 8.14 release. Once that settles down, I'm going to work on pulling these generated definitions into Kibana "for real."
At that point, I think I'll have a clearer understanding of exactly what structure would work best.
But in general if something says datetime now it can take a literal string that is formatted like a date. Same for numbers.
Noted, thanks!
datetime | integer | keyword | text | datetime | ||
datetime | integer | text | datetime | datetime | ||
datetime | integer | text | keyword | datetime | ||
datetime | integer | text | text | datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. We no longer tell folks they can use strings here, only dates. But maybe a human is ok because the examples will contain string literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct, though? The function does accept the string types (the foldToLong()
logic remained), but generally, wouldn't we have to rather extend the function infos of scalar/arithmetic functions to include these types, now that we autocast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the implicit casting is done at resolution phase. We can do auto casting if the string literals are provided as input to the bucket function directly, e.g.
BUCKET(date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z"). --> auto casting convert from and to to datetime
However, if the string literals are provided as input to the bucket function through a variable, auto casting does not apply, e.g.
| EVAL bucket_start = "1999-01-01T00:00:00Z". --> auto casting does not apply
| EVAL bucket_end = NOW()
| EVAL bucket = AUTO_BUCKET(hire_date, 5, bucket_start, bucket_end)
This is the main reason that there are still code inside to function to do the conversion, as the foldable eval propagation happens at logical planner. We have some options to make auto casting work better, like:
- apply the implicit casting rule in logical planner after foldable eval propagation
- apply foldable eval propagation in resolution phase (I brought up this idea before, probably not a good one)
There are some follow up work related to the auto casting. A lot(perhaps most) of the rules in analyzer and logical planner are related to query transformation, there are chances that we can rearrange the rules to make resolution and query transformation work more effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to whether this is right or not, I dunno. You certainly can only put date-like-strings in it. And now that we support implicit conversion for these literals everywhere where this list says "date" it means "date or date-like-string". I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main reason that there are still code inside to function to do the conversion
Sure, that's fine. I was mostly commenting on adjusting the function info to remove the keyword
and text
types as accepted by the function (and the docs changes that ensued).
* string literal casting for scalar functions and arithmetic operations.
## Summary ## String constants in place of dates As of elastic/elasticsearch#106932 string constants can always be used in place of dates. - `| eval date_extract("DAY_OF_WEEK", "2024-05-07T13:20:40.554Z")` - `| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z") | SORT bucket` - `| eval date_diff("day", "2021-01-01", "2022-01-01")` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <[email protected]>
## Summary ## String constants in place of dates As of elastic/elasticsearch#106932 string constants can always be used in place of dates. - `| eval date_extract("DAY_OF_WEEK", "2024-05-07T13:20:40.554Z")` - `| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z") | SORT bucket` - `| eval date_diff("day", "2021-01-01", "2022-01-01")` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit 9ddc3c6)
# Backport This will backport the following commits from `main` to `8.14`: - [[ES|QL] accept string constants for date args (#182856)](#182856) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Drew Tate","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-05-08T13:53:47Z","message":"[ES|QL] accept string constants for date args (#182856)\n\n## Summary\r\n\r\n## String constants in place of dates\r\n\r\nAs of elastic/elasticsearch#106932 string\r\nconstants can always be used in place of dates.\r\n\r\n- `| eval date_extract(\"DAY_OF_WEEK\", \"2024-05-07T13:20:40.554Z\")`\r\n- `| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20,\r\n\"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\r\n| SORT bucket`\r\n- `| eval date_diff(\"day\", \"2021-01-01\", \"2022-01-01\")`\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"9ddc3c6aa836e9e5d3bdd30512f7de55d62a16d2","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:prev-minor","Feature:ES|QL","v8.14.0","Team:ESQL","v8.15.0"],"title":"[ES|QL] accept string constants for date args","number":182856,"url":"https://github.com/elastic/kibana/pull/182856","mergeCommit":{"message":"[ES|QL] accept string constants for date args (#182856)\n\n## Summary\r\n\r\n## String constants in place of dates\r\n\r\nAs of elastic/elasticsearch#106932 string\r\nconstants can always be used in place of dates.\r\n\r\n- `| eval date_extract(\"DAY_OF_WEEK\", \"2024-05-07T13:20:40.554Z\")`\r\n- `| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20,\r\n\"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\r\n| SORT bucket`\r\n- `| eval date_diff(\"day\", \"2021-01-01\", \"2022-01-01\")`\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"9ddc3c6aa836e9e5d3bdd30512f7de55d62a16d2"}},"sourceBranch":"main","suggestedTargetBranches":["8.14"],"targetPullRequestStates":[{"branch":"8.14","label":"v8.14.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/182856","number":182856,"mergeCommit":{"message":"[ES|QL] accept string constants for date args (#182856)\n\n## Summary\r\n\r\n## String constants in place of dates\r\n\r\nAs of elastic/elasticsearch#106932 string\r\nconstants can always be used in place of dates.\r\n\r\n- `| eval date_extract(\"DAY_OF_WEEK\", \"2024-05-07T13:20:40.554Z\")`\r\n- `| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20,\r\n\"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\r\n| SORT bucket`\r\n- `| eval date_diff(\"day\", \"2021-01-01\", \"2022-01-01\")`\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"9ddc3c6aa836e9e5d3bdd30512f7de55d62a16d2"}}]}] BACKPORT--> Co-authored-by: Drew Tate <[email protected]>
Cast string literals to expected data types of ESQL scalar functions and arithmetic operations automatically.