Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix for overzealous validation in case of invalid mapped fields #111475

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Jul 31, 2024

Fix #111452

#107545 postpones some of the validation for fields that occur in multiple indices with different types. Such fields must not be explicitly used in the query.

The validation now happens after the main reference resolution. This is problematic for renames in particular. Consider a field a that has the same type in all indices:

FROM idx | RENAME a AS b

is perfectly valid. But after reference resolution, the RENAME becomes a PROJECT that has to list all fields that it will output. If there is a field that is keyword in index1 but ip in index2, we have the following

FROM idx | PROJECT field, a AS b

Performing the validation now gives wrong results, because it sees field and flags it as invalid; field was not used explicitly, but ended up in the plan due to replacing RENAME by a PROJECT.

This PR reverts to the original approach: during the initial resolution in ResolveRefs, we immediately validate usage of a field attribute referring to a multi-typed field. Usage is considered valid if the attribute is either used inside a conversion function, like TO_IP, or if the attribute is being dropped.

Updated with another approach: This PR now aligns how we treat multi-typed fields with how we treat unsupported field types. The latter are already being passed through and are even allowed to be mentioned explicitly in KEEP, RENAME and DROP. To achieve this, rather than mapping FieldAttributes containing an InvalidMappedField to UnresolvedAttribute - which the verifier finds and counts as invalid query - we map such field attributes to UnsupportedAttribute, which is what fields with unsupported types are also being mapped to.

To avoid weird workarounds when resolving STATS, type resolution for convert functions becomes more lenient, so that multi-typed field attributes (FieldAttribute containing an InvalidMappedField) are considered resolved as long as all of the mapped types are supported by the convert function; e.g. TO_IP(client_ip) in case client_ip is mapped as both keyword and ip.

@alex-spies alex-spies added the test-full-bwc Trigger full BWC version matrix tests label Jul 31, 2024
This is more conceptually correct: union typed attributes must not be
used in a query, unless they're immediately cast to a single type or
dropped.

Implementation is haaacky and needs improvement, but it gets the job
done.
@alex-spies alex-spies changed the title ESQL: quick fix for overzealous validation in case of invalid mapped fields ESQL: Fix for overzealous validation in case of invalid mapped fields Jul 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies marked this pull request as ready for review August 2, 2024 10:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've updated the changelog YAML for you.

@alex-spies alex-spies added the auto-backport Automatically create backport pull requests when merged label Aug 2, 2024
@alex-spies
Copy link
Contributor Author

Discussed this offline with @costin and @astefan : we won't allow RENAMEs of unsupported/multi-typed fields for now; we can add this later.

Updated the PR, invalidated such RENAMEs and added a verifier test. If CI likes this, I think this should be good to go.

@alex-spies alex-spies requested a review from astefan August 8, 2024 10:29
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alex-spies alex-spies merged commit 585480f into elastic:main Aug 9, 2024
15 checks passed
@alex-spies alex-spies deleted the fix-union-types-breaking-unsupported-in-renames branch August 9, 2024 07:38
@@ -2018,6 +2018,19 @@ M
null
;

shadowingInternalWithGroup2#[skip:-8.14.1,reason:implemented in 8.14]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not related, but good to add a shadowing test within the groups in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could've tripped due to the logic in the resolution of STATS, which didn't look like it correctly tracks the order of the groups. Turns out it did, because AttributeMap uses a LinkedHashMap internally; but better to have something to guard against regressions.

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 111475

@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.15

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Aug 9, 2024
…elastic#111475)

Fix validation of fields mapped to different types in different indices and align with validation of fields of unsupported type.

* Allow using multi-typed fields in KEEP and DROP, just like unsupported fields.
* Explicitly invalidate using both these field kinds in RENAME.
* Map both kinds of fields to UnsupportedAttribute to enforce consistency.
* Consider convert functions containing valid multi-typed fields as resolved to avoid weird workarounds when resolving STATS.
* Add a bunch of tests.

(cherry picked from commit 585480f)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Stats.java
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2024
…#111475) (#111735)

Fix validation of fields mapped to different types in different indices and align with validation of fields of unsupported type.

* Allow using multi-typed fields in KEEP and DROP, just like unsupported fields.
* Explicitly invalidate using both these field kinds in RENAME.
* Map both kinds of fields to UnsupportedAttribute to enforce consistency.
* Consider convert functions containing valid multi-typed fields as resolved to avoid weird workarounds when resolving STATS.
* Add a bunch of tests.

(cherry picked from commit 585480f)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Stats.java
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Aug 9, 2024
* upstream/main: (22 commits)
  Prune changelogs after 8.15.0 release
  Bump versions after 8.15.0 release
  EIS integration (elastic#111154)
  Skip LOOKUP/INLINESTATS cases unless on snapshot (elastic#111755)
  Always enforce strict role validation (elastic#111056)
  Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testUnsupportedAndMultiTypedFields elastic#111753
  [ML] Force time shift integration test (elastic#111620)
  ESQL: Add tests for sort, where with unsupported type (elastic#111737)
  [ML] Force time shift documentation (elastic#111668)
  Fix remote cluster credential secure settings reload   (elastic#111535)
  ESQL: Fix for overzealous validation in case of invalid mapped fields (elastic#111475)
  Pass allow security manager flag in gradle test policy setup plugin (elastic#111725)
  Rename streamContent/Separator to bulkContent/Separator (elastic#111716)
  Mute org.elasticsearch.tdigest.ComparisonTests testSparseGaussianDistribution elastic#111721
  Remove 8.14 from branches.json
  Only emit product origin in deprecation log if present (elastic#111683)
  Forward port release notes for v8.15.0 (elastic#111714)
  [ES|QL] Combine Disjunctive CIDRMatch (elastic#111501)
  ESQL: Remove qualifier from attrs (elastic#110581)
  Force using the last centroid during merging (elastic#111644)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
…elastic#111475)

Fix validation of fields mapped to different types in different indices and align with validation of fields of unsupported type.

* Allow using multi-typed fields in KEEP and DROP, just like unsupported fields.
* Explicitly invalidate using both these field kinds in RENAME.
* Map both kinds of fields to UnsupportedAttribute to enforce consistency.
* Consider convert functions containing valid multi-typed fields as resolved to avoid weird workarounds when resolving STATS.
* Add a bunch of tests.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
…elastic#111475)

Fix validation of fields mapped to different types in different indices and align with validation of fields of unsupported type.

* Allow using multi-typed fields in KEEP and DROP, just like unsupported fields.
* Explicitly invalidate using both these field kinds in RENAME.
* Map both kinds of fields to UnsupportedAttribute to enforce consistency.
* Consider convert functions containing valid multi-typed fields as resolved to avoid weird workarounds when resolving STATS.
* Add a bunch of tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-full-bwc Trigger full BWC version matrix tests v8.15.0 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: unexpected verification_exception for same field name in two indices (union types)
5 participants