Don't return TEXT type for functions that take TEXT #114334

craigtaverner · 2024-10-08T15:52:26Z

Always return KEYWORD for functions that previously returned TEXT, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the TEXT field value. In principle, if the attribute is no longer the original FieldAttribute, it can no longer claim to have the type TEXT.

This has been done for all functions: conversion functions, aggregating functions, multi-value functions. There were several that already produced KEYWORD for TEXT input (eg. ToString, FromBase64 and ToBase64, MvZip, ToLower, ToUpper, DateFormat, Concat, Left, Repeat, Replace, Right, Split, Substring), but many others that incorrectly claimed to produce TEXT, while this was really a false claim. This PR makes that now strict, and includes changes to the functions' units tests to disallow the tests to expect any functions output to be TEXT.

One side effect of this change is that methods that take multiple parameters that require all of them to have the same type, will now treat TEXT and KEYWORD the same. This was already the case for functions like Concat, but is now also the case for Greatest, Least, Case, Coalesce and MvAppend.

An associated change is that the type casting operator ::text has been entirely removed. It used to map onto the ToString function which returned type KEYWORD, and so ::text really produced a KEYWORD, which is a lie, or at least a bug, which is now fixed. Should we ever wish to actually produce real TEXT, we might love the fact that this operator has been freed up for future use (although it seems likely that function will require parameters to specify the analyzer, so might never be an operator again).

Backwards compatibility issues:

This is a change that will fail BWC tests, since we have many tests that assert on TEXT output to functions. For this reason we needed to block two scenarios:

We used the capability functions_never_emit_text to prevent 7 csv-spec tests and 2 yaml tests from being run against older versions that still emit text.
We used skipTest to also block those two yaml tests from being run against the latest build, but using older yaml files downloaded (as far back as 8.14).

In all cases the change observed in these tests was simply the results columns no longer having text type, and instead being keyword.

Kibana

Kibana makes use of some JSON files we produce during tests runs that explain the signatures of the functions, and the available type-casting operators. These are now edited to remove all TEXT output, and remove the ::test operator.

Fixes

Fixes #114333
Fixes #111537

Always return KEYWORD, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the TEXT value.

elasticsearchmachine · 2024-10-08T15:53:06Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-10-08T15:53:06Z

Hi @craigtaverner, I've created a changelog YAML for you.

…return_text

…verner/elasticsearch into esql_functions_never_return_text

…return_text

nik9000 · 2024-10-08T16:45:46Z

Change the parent class UnaryScalarFunction to always return KEYWORD for TEXT fields (done in this PR to cover functions not yet fixed)

I think I'd prefer if UnaryScalarFunction didn't have a default return type. That's a bigger change though because everyone's got to fill in their return type.

astefan

I think it's a good step forward, but imo we could do better.
This needs a general evaluation of the situation because text as an output data type of our own functions is confusing, meaningless and not so useful at this point in the life of the language. So:

all functions need to be checked. I've caught max(text) which returns text. If we are to make this change, let's make it general.
to catch any misses, we should have a check in a central location for return data types of any kind of function/operator/anything-else (maybe after the logical planner is done, not sure). Maybe with an assert so that we don't fail queries at runtime, unless it's CI/tests.
another way of looking at the problem is that text (as a data type) does only make sense for FieldAttributes? (meaning we can accept text fields, we can do stuff with them, but creating a text shouldn't be possible)

astefan · 2024-10-09T16:08:38Z

Change the parent class UnaryScalarFunction to always return KEYWORD for TEXT fields (done in this PR to cover functions not yet fixed)

I think I'd prefer if UnaryScalarFunction didn't have a default return type. That's a bigger change though because everyone's got to fill in their return type.

++
Also, UnaryScalarFunction is not the only function in the class' hierarchy.

nik9000 · 2024-10-09T16:44:18Z

central location

I was thinking that the tests for functions could assert that they never spit out text. I think you could override that when we go to implement ANALYZE(kwd, FRENCH) or whatever we call it, but only a hand full of function will ever output text fields and I'm ok with extra work telling you "you probably don't want to return text here" when writing them.

astefan · 2024-10-11T11:11:45Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Values.java

@@ -80,7 +82,8 @@ public Values replaceChildren(List<Expression> newChildren) {

    @Override
    public DataType dataType() {
-        return field().dataType();
+        DataType t = field().dataType();


Instead of doing this for every aggregation function, how about going to AggregateFunction and define there something similar to UnaryScalarFunction.dataType()?

We pull this text from 8.x and run it on current builds in yamlRestCompatTest, so we need to disable this check here (and in 8.x)

ivancea · 2024-10-23T14:55:25Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/TestCaseSupplier.java

@@ -1435,7 +1435,7 @@ public static TestCase typeError(List<TypedData> data, String expectedTypeError)
            this.source = Source.EMPTY;
            this.data = data;
            this.evaluatorToString = evaluatorToString;
-            this.expectedType = expectedType;
+            this.expectedType = expectedType == null ? null : expectedType.noText();


If we're already adding the .noText() to the test cases (like MaxTests), why are we adding it here too?

This is the comprehensive 'catch all' case we added late in the PR once we had covered all the low hanging fruit. But perhaps this means we can remove some of the other earlier code. I'll investigate.

Indeed there were 12 places we had added this noText() that no longer needed it, since they all called into this TestCase constructor. I've removed them all and will push a commit after re-running tests to be safe.

ivancea · 2024-10-23T15:06:41Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java

@@ -117,7 +117,6 @@ public class EsqlDataTypeConverter {
        entry(LONG, ToLong::new),
        // ToRadians, typeless
        entry(KEYWORD, ToString::new),
-        entry(TEXT, ToString::new),


This change may close/cancel #111537 (?)

Yes! I was busy creating a new issue for that, but glad to see you already did, way back! I'll add a 'fixes' line to the PR description.

astefan

LGTM. Looking much better. Thanks, Craig.

luigidellaquila

LGTM, thanks!

elasticsearchmachine · 2024-10-24T10:13:24Z

Hi @craigtaverner, I've updated the changelog YAML for you.

Since we added a check for this inside the TestCase constructor, all these extra calls are no longer needed.

…verner/elasticsearch into esql_functions_never_return_text

…return_text

bpintea

Lgtm

bpintea · 2024-10-24T12:56:15Z

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

@@ -584,6 +584,10 @@ static Builder builder() {
        return new Builder();
    }

+    public DataType noText() {
+        return this == DataType.TEXT ? DataType.KEYWORD : this;


Nit:

Suggested change

return this == DataType.TEXT ? DataType.KEYWORD : this;

return this == TEXT ? KEYWORD : this;

bpintea · 2024-10-24T12:59:22Z

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Min.java

+        DataType t = field().dataType();
+        return t == TEXT ? KEYWORD : t;


Suggested change

DataType t = field().dataType();

return t == TEXT ? KEYWORD : t;

return field().dataType().noText();

Done. I normally like to accept the commit suggestion, but in this case the remaining imports would cause a build failure, so I did this locally and will push a commit for all three places soon.

all good, it's not like i really contributed :)

bpintea · 2024-10-24T13:01:48Z

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

@@ -584,6 +584,10 @@ static Builder builder() {
        return new Builder();
    }

+    public DataType noText() {


textAsKeyword()? noText is compact, but I find it a tad misleading. Minor preference, can stay as is.

The idea is sound, but I have a minor preference for noText as textAsKeyword also seems to say something about the current datatype being TEXT; which it likely is not.

bpintea · 2024-10-24T13:02:27Z

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Top.java

+        DataType t = field().dataType();
+        return t == TEXT ? KEYWORD : t;


bpintea · 2024-10-24T13:05:17Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Values.java

@@ -91,7 +93,8 @@ public Values withFilter(Expression filter) {

    @Override
    public DataType dataType() {
-        return field().dataType();
+        DataType t = field().dataType();
+        return t == TEXT ? KEYWORD : t;


…return_text

elasticsearchmachine · 2024-10-25T08:11:05Z

💚 Backport successful

Status	Branch	Result
✅	8.x

Always return `KEYWORD` for functions that previously returned `TEXT`, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the `TEXT` field value. In principle, if the attribute is no longer the original `FieldAttribute`, it can no longer claim to have the type `TEXT`. This has been done for all functions: conversion functions, aggregating functions, multi-value functions. There were several that already produced `KEYWORD` for `TEXT` input (eg. ToString, FromBase64 and ToBase64, MvZip, ToLower, ToUpper, DateFormat, Concat, Left, Repeat, Replace, Right, Split, Substring), but many others that incorrectly claimed to produce `TEXT`, while this was really a false claim. This PR makes that now strict, and includes changes to the functions' units tests to disallow the tests to expect any functions output to be `TEXT`. One side effect of this change is that methods that take multiple parameters that require all of them to have the same type, will now treat TEXT and KEYWORD the same. This was already the case for functions like `Concat`, but is now also the case for `Greatest`, `Least`, `Case`, `Coalesce` and `MvAppend`. An associated change is that the type casting operator `::text` has been entirely removed. It used to map onto the `ToString` function which returned type KEYWORD, and so `::text` really produced a `KEYWORD`, which is a lie, or at least a `bug`, which is now fixed. Should we ever wish to actually produce real `TEXT`, we might love the fact that this operator has been freed up for future use (although it seems likely that function will require parameters to specify the analyzer, so might never be an operator again). ### Backwards compatibility issues: This is a change that will fail BWC tests, since we have many tests that assert on TEXT output to functions. For this reason we needed to block two scenarios: * We used the capability `functions_never_emit_text` to prevent 7 csv-spec tests and 2 yaml tests from being run against older versions that still emit text. * We used `skipTest` to also block those two yaml tests from being run against the latest build, but using older yaml files downloaded (as far back as 8.14). In all cases the change observed in these tests was simply the results columns no longer having `text` type, and instead being `keyword`. --------- Co-authored-by: Luigi Dell'Aquila <[email protected]>

TO_UPPER/TO_LOWER resolution incorrectly returned child's type (that could also be `null`, type `NULL`), instead of KEYWORD/TEXT. So a test like `TO_UPPER(null) == "..."` fails on type mismatch. This was fixed collaterally by #114334 (8.17.0) Also, correct some of the tests skipping (that had however no impact, due to testing range).

TO_UPPER/TO_LOWER resolution incorrectly returned child's type (that could also be `null`, type `NULL`), instead of KEYWORD/TEXT. So a test like `TO_UPPER(null) == "..."` fails on type mismatch. This was fixed collaterally by elastic#114334 (8.17.0) Also, correct some of the tests skipping (that had however no impact, due to testing range). (cherry picked from commit edb3818)

* ESQL: Rewrite TO_UPPER/TO_LOWER comparisons (#118870) This adds an optimization rule to rewrite TO_UPPER/TO_LOWER comparisons against a string into an InsensitiveEquals comparison. The rewrite can also result right away into a TRUE/FALSE, in case the string doesn't match the caseness of the function. This also allows later pushing down the predicate to lucene as a case-insensitive term-query. Fixes #118304. * Disable `TO_UPPER(null)`-tests prior to 8.17 (#119213) TO_UPPER/TO_LOWER resolution incorrectly returned child's type (that could also be `null`, type `NULL`), instead of KEYWORD/TEXT. So a test like `TO_UPPER(null) == "..."` fails on type mismatch. This was fixed collaterally by #114334 (8.17.0) Also, correct some of the tests skipping (that had however no impact, due to testing range). (cherry picked from commit edb3818)

TO_UPPER/TO_LOWER resolution incorrectly returned child's type (that could also be `null`, type `NULL`), instead of KEYWORD/TEXT. So a test like `TO_UPPER(null) == "..."` fails on type mismatch. This was fixed collaterally by elastic#114334 (8.17.0) Also, correct some of the tests skipping (that had however no impact, due to testing range).

Don't return TEXT type for functions that take TEXT

2e647cf

Always return KEYWORD, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the TEXT value.

craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged v8.15.3 labels Oct 8, 2024

craigtaverner requested a review from luigidellaquila October 8, 2024 15:52

elasticsearchmachine added the v9.0.0 label Oct 8, 2024

craigtaverner added 6 commits October 8, 2024 17:53

Update docs/changelog/114334.yaml

ff79ec0

Remove text from return type documented, and meta-tests

f7fd864

Merge remote-tracking branch 'origin/main' into esql_functions_never_…

04d363b

…return_text

Support REVERSE

aa72517

Merge branch 'esql_functions_never_return_text' of github.com:craigta…

a9517e1

…verner/elasticsearch into esql_functions_never_return_text

Merge remote-tracking branch 'origin/main' into esql_functions_never_…

96e6575

…return_text

craigtaverner added 2 commits October 9, 2024 10:38

Added EsqlCapabilities for reverse function emitting keyword

3793ffe

Fixed failing test

a11cd1d

astefan requested changes Oct 9, 2024

View reviewed changes

elasticsearchmachine added v8.15.4 and removed v8.15.3 labels Oct 10, 2024

luigidellaquila and others added 2 commits October 11, 2024 10:24

Fix tests

9bdb58a

Expand EsqlCapability to cover all functions now emitting KEYWORD

fcc93a1

astefan reviewed Oct 11, 2024

View reviewed changes

craigtaverner added 2 commits October 11, 2024 14:32

More tests failing due to new behaviour

7e642ad

Remove type assertion for function merged back to 8.x already

589f5e2

We pull this text from 8.x and run it on current builds in yamlRestCompatTest, so we need to disable this check here (and in 8.x)

craigtaverner removed the v8.15.4 label Oct 11, 2024

ivancea reviewed Oct 23, 2024

View reviewed changes

astefan approved these changes Oct 24, 2024

View reviewed changes

luigidellaquila approved these changes Oct 24, 2024

View reviewed changes

Update docs/changelog/114334.yaml

04134b5

craigtaverner added 3 commits October 24, 2024 12:39

Removed unnecessary calls to noText in test cases

8f3bfc3

Since we added a check for this inside the TestCase constructor, all these extra calls are no longer needed.

Merge branch 'esql_functions_never_return_text' of github.com:craigta…

cfba4db

…verner/elasticsearch into esql_functions_never_return_text

Merge remote-tracking branch 'origin/main' into esql_functions_never_…

fb2545d

…return_text

craigtaverner added v8.16.1 v8.15.4 v8.16.0 and removed v8.16.1 labels Oct 24, 2024

bpintea approved these changes Oct 24, 2024

View reviewed changes

craigtaverner added 2 commits October 24, 2024 15:20

Merge remote-tracking branch 'origin/main' into esql_functions_never_…

853423f

…return_text

Removed three remaining ternary functions, using noText instead

8d03838

craigtaverner removed v8.16.0 v8.15.4 labels Oct 24, 2024

craigtaverner merged commit 3d307e0 into elastic:main Oct 25, 2024
16 checks passed

craigtaverner mentioned this pull request Oct 25, 2024

[8.x] Don't return TEXT type for functions that take TEXT (#114334) #115625

Merged

craigtaverner mentioned this pull request Oct 25, 2024

Fixed flaky test after PR that disallows functions to return TEXT #115633

Merged

drewdaemon mentioned this pull request Oct 28, 2024

[ES|QL] remove text option for inline casting elastic/kibana#198037

Open

bpintea mentioned this pull request Dec 23, 2024

Disable TO_UPPER(null) BWC tests prior to 8.17 #119213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't return TEXT type for functions that take TEXT #114334

Don't return TEXT type for functions that take TEXT #114334

craigtaverner commented Oct 8, 2024 •

edited

Loading

elasticsearchmachine commented Oct 8, 2024

elasticsearchmachine commented Oct 8, 2024

nik9000 commented Oct 8, 2024

astefan left a comment

astefan commented Oct 9, 2024

nik9000 commented Oct 9, 2024

astefan Oct 11, 2024

ivancea Oct 23, 2024

craigtaverner Oct 24, 2024

craigtaverner Oct 24, 2024

ivancea Oct 23, 2024

craigtaverner Oct 24, 2024

astefan left a comment

luigidellaquila left a comment

elasticsearchmachine commented Oct 24, 2024

bpintea left a comment

bpintea Oct 24, 2024

craigtaverner Oct 24, 2024

bpintea Oct 24, 2024

craigtaverner Oct 24, 2024

bpintea Oct 24, 2024

bpintea Oct 24, 2024

craigtaverner Oct 24, 2024

bpintea Oct 24, 2024

craigtaverner Oct 24, 2024

bpintea Oct 24, 2024

craigtaverner Oct 24, 2024

elasticsearchmachine commented Oct 25, 2024

	return this == DataType.TEXT ? DataType.KEYWORD : this;
	return this == TEXT ? KEYWORD : this;

		DataType t = field().dataType();
		return t == TEXT ? KEYWORD : t;

	DataType t = field().dataType();
	return t == TEXT ? KEYWORD : t;
	return field().dataType().noText();

Don't return TEXT type for functions that take TEXT #114334

Don't return TEXT type for functions that take TEXT #114334

Conversation

craigtaverner commented Oct 8, 2024 • edited Loading

Backwards compatibility issues:

Kibana

Fixes

elasticsearchmachine commented Oct 8, 2024

elasticsearchmachine commented Oct 8, 2024

nik9000 commented Oct 8, 2024

astefan left a comment

Choose a reason for hiding this comment

astefan commented Oct 9, 2024

nik9000 commented Oct 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

luigidellaquila left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 24, 2024

bpintea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 25, 2024

💚 Backport successful

craigtaverner commented Oct 8, 2024 •

edited

Loading