-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#639: allow metadata fields (_id) to be used as QualifiedName #142
Conversation
…lifiedName Signed-off-by: Andrew Carbonetto <[email protected]>
Codecov Report
@@ Coverage Diff @@
## integ-poc-metadata-fields #142 +/- ##
================================================================
- Coverage 97.95% 62.76% -35.20%
================================================================
Files 303 10 -293
Lines 7783 658 -7125
Branches 501 119 -382
================================================================
- Hits 7624 413 -7211
- Misses 158 192 +34
- Partials 1 53 +52
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was straight-forward! It does bring up questions...
How will this interact with Prometheus and other catalogs? They have different notion of what metadata fields are.
I remember we had METADATA(...)...
syntax at some point -- that was a good idea.
It can work with StorageEngine
instance and be catalog agnostic.
Say, StorageEngine
has getMetadata
method that returns a metadata descriptor -- a map from a name of metadata field to a getter function. The function returns that field for a given result row.
When a query like SELECT METADATA(_id) FROM opensearch.indexA
executes, OpenSearchIndexScan
will return field data as well as metadata descriptor. ProjectOperator
will execute METADATA
implementation and pass it _id, metadata descriptor, and current result row.
METADATAwill check
_id` is valid metadata field and execute the getter function on the current result row.
This will avoid keeping catalog-specific data in the core
module and the parser and support different metadata for different catalogs.
I expect we can throw a SemanticCheck
exception for a query like SELECT METADATA(_id) FROM prometheus.tableA
.
Last point. OpenSearch seems to support custom metadata fields for a plugin. I'm curious what it would take to support them.
if (!Float.isNaN(hit.getScore())) { | ||
builder.put("_score", new ExprFloatValue(hit.getScore())); | ||
} | ||
if (!Float.isNaN(maxScore)) { | ||
builder.put("_maxscore", new ExprLongValue(maxScore)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When _score
/_maxscore
could be missing?
What will be returned to user in that case? NULL
? MISSING
(aka empty cell)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null is what gets returned. That's what's defined in the _score field.
Which is consistent with what OpenSearch is returning... I'm assuming that's the correct way forward.
} | ||
|
||
/** | ||
* QualifiedName Constructor. | ||
*/ | ||
public QualifiedName(Iterable<String> parts) { | ||
public QualifiedName(Iterable<String> parts, Boolean isMetadataField) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to determine isMetadataField
dynamically rather than through constructor parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easiest to define it at constructor time... since we need to look up the expression tree.
} | ||
|
||
public QualifiedName(Iterable<String> parts) { | ||
this(parts, Boolean.FALSE); | ||
} | ||
|
||
/** | ||
* QualifiedName Constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment need to be moved up? Are the above not considered as constructors as well?
@@ -110,4 +123,6 @@ public List<UnresolvedExpression> getChild() { | |||
public <R, C> R accept(AbstractNodeVisitor<R, C> nodeVisitor, C context) { | |||
return nodeVisitor.visitQualifiedName(this, context); | |||
} | |||
|
|||
public Boolean isMetadataField() { return Boolean.TRUE; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this line. Doesn't this always return true? was it meant to return isMetadataField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha. That was wrong. I've fix this now.
Signed-off-by: Andrew Carbonetto <[email protected]>
|
||
ImmutableMap.Builder<String, ExprValue> builder = new ImmutableMap.Builder<>(); | ||
builder.putAll(docData.tupleValue()); | ||
builder.put("_index", new ExprStringValue(hit.getIndex())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we ever have hits from different indices?
If yes, maxScore
should be gotten per each hit. If not, you can move getIndex()
outside of the loop to optimize performance.
@@ -430,12 +430,13 @@ private UnresolvedExpression visitConstantFunction(String functionName, | |||
} | |||
|
|||
private QualifiedName visitIdentifiers(List<IdentContext> identifiers) { | |||
Boolean isMetadataField = identifiers.stream().filter(id -> id.metadataField() != null).findFirst().isPresent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be there a mix of different types of identifiers - some of them are metadata and some of them are not?
Co-authored-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Description
OpenSearch reserved fields (
_id
,_score
,_index
, etc) are not allowed to be used in SQL clauses (SELECT, WHERE, ORDER BY) because the field format starting with underscore_
is not allowed.This ticket adds specific identifiers to the language, and opens up support for OpenSearch reserved identifiers.
As an aside, identifiers with double underscore at the start (such as
__myCoolField
) is acceptable as an identifier.Example:
SELECT calcs.key, str0, _id, _score, _maxscore FROM calcs WHERE _id="5"
Result:
{ "key04", "OFFICE SUPPLIES", "5", null, null }
Example - Metadata fields not requested are not displayed:
SELECT *, _id FROM bigint WHERE _id="2"
Result (assuming bigint only has one field):
[9223372026854775807, "2"]
Issues Resolved
#639
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Signed-off-by: Andrew Carbonetto [email protected]