-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PartiQL Eval - Planner Mode #1385
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## v1 #1385 +/- ##
=====================================
Coverage ? 50.32%
Complexity ? 1045
=====================================
Files ? 165
Lines ? 13129
Branches ? 2452
=====================================
Hits ? 6607
Misses ? 5862
Partials ? 660
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Conformance comparison report-Cross Engine
Number failing in both: 262 Number passing in legacy engine but fail in eval engine: 736 Number failing in legacy engine but pass in eval engine: 173 Conformance comparison report-Cross Commit-LEGACY
Number failing in both: 435 Number passing in Base (810035c) but now fail: 0 Number failing in Base (810035c) but now pass: 0 Conformance comparison report-Cross Commit-EVAL
Number failing in both: 946 Number passing in Base (810035c) but now fail: 52 Number failing in Base (810035c) but now pass: 76 |
// Information that worth to have to help end customer debugging the plan. | ||
debug::[ | ||
call::[ | ||
unresolved::{ | ||
identifier: identifier, | ||
args: list::[rex], | ||
}, | ||
], | ||
problem::[ | ||
missing::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
error::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid backward incompatible changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'm curious to learn more about how/why this is used. I believe attaching more information to the Rel.Op.Err is a better path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially:
Consider the query 1 + t.f2
where t is t : {f1: INT2}
The design was to have a trace of missing:
Query[tag=Plan-780ab923]
└── Missing[problem=-1:-1:<unknown>: int4, missing is/are incompatible data types for the 'PLUS' operator., tag=Plan-39c9df92, static_type=missing]
└── Dynamic[tag=Plan-7452fb19]
├── Lit[value=Int32ValueImpl(value=1, annotations=[]), tag=Plan-6063c078, static_type=int4]
├── Missing[problem=-1:-1:<unknown>: Expression always returns null or missing: caused by Path Navigation always returns MISSING, tag=Plan-3defad5b, static_type=missing]
│ └── Symbol[key=f2, tag=Plan-ccbc991]
│ └── Global[tag=Plan-e021a9f7, static_type=struct(f1: int2, [Open(value=false), UniqueAttrs(value=true), Ordered])]
│ └── Ref[catalog=0, symbol=8, tag=Plan-cfc1a326]
├── Candidate[tag=Plan-7e3c9eab]
│ └── Ref[catalog=0, symbol=0, tag=Plan-6263a65a]
| .......
Breaking it down a little bit:
Query[tag=Plan-780ab923]
└── Missing[problem=-1:-1:<unknown>: int4, missing is/are incompatible data types for the 'PLUS' operator., tag=Plan-39c9df92, static_type=missing]
Top level missing -> because 1 + MISSING is an unresolved function.
Note that the idea is to have 1 + MISSING report type mismatch error, NOT plus function propagate missing.
This is because the isMissingCall
flag is set on each FnSignature.
During function resolution, we first get all the variants based on function name. We never prohibit the possibility to have say two variants of the same function name, but has different isMissingCall
flag.
Next:
└── Missing[problem=-1:-1:<unknown>: int4, missing is/are incompatible data types for the 'PLUS' operator., tag=Plan-39c9df92, static_type=missing]
└── Dynamic[tag=Plan-7452fb19]
The RexOpMissing should have put all the candidate function to help debugging. This is essentially to show the user what is the actually input type and all the possible legal static type.
Next:
└── Dynamic[tag=Plan-7452fb19]
├── Lit[value=Int32ValueImpl(value=1, annotations=[]), tag=Plan-6063c078, static_type=int4]
├── Missing[problem=-1:-1:<unknown>: Expression always returns null or missing: caused by Path Navigation always returns MISSING, tag=Plan-3defad5b, static_type=missing]
│ └── Symbol[key=f2, tag=Plan-ccbc991]
│ └── Global[tag=Plan-e021a9f7, static_type=struct(f1: int2, [Open(value=false),
The interesting part is the second arg, which contains another rexOpMissing. The idea is to have the trace (what caused this missing), again for debugging purpose.
====
Now those trace are nice for debugging, but the evaluator does not really care about those.
So instead of having the evaluator to deal with rexOpMissing, it could be nice, before we send the plan to evaluator, does the conversion ourselves.
// Plan for evaluator to consume
└── Query[tag=Plan-951622ec]
└── Lit[value=MissingValueImpl(annotations=[]), tag=Plan-33b21fcd, static_type=missing]
Hence we have essentially two planner build for that purpose.
We can put all those information to Rex.Op.MISSING in the public plan, but I having those as debugging node avoids backward incompatible changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how attaching this information is useful, but why is this debug in the public API plan? This makes sense for the internal plan, and we can dump these detailed messages during the PlanTransform.
I don't believe we should be increasing the public plan surface with debug information — error node is already a stretch..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is really asking, could we please consolidate any error/debug messaging to the single err node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocker here really comes from unresolved function:
if you have not_a_func(arg1, arg2)
The internal plan is:
To faithfully preserve the problem trace in the public plan as a PlanNode, we the have to wrap this unresolvedFunction in to a node.
// What to turn that into the public node?
internal data class Unresolved(
@JvmField internal val identifier: Identifier,
@JvmField internal val args: List<Rex>,
)
The best option I see is to have a dedicated debug node: i.e., you have a debugging node for unresolved function and wrapper that node as the input for Rex.Op.Err. This way we fully preserved the node information and it is very easy to present to the customer about what exactly the error is.
I am sure we can hack around the issues, for example, we can construct some rex.op for all the args and left the function name in the error message, but this is a little hacky in my opinion.
If we need the unresolved function node as a debug node, I don't see a reason why you should not have the missingOp and errOp as debug node. Separating those two in debug node makes debugging a little easier in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assumed that you want to use the plan not just the problem to debug:
Take not_a_function(atomic.no_exist_field)
for example
Plan:
--------Plan---------
⚬ PartiQLPlan[tag=Plan-9d5bede7]
├── Catalog[name=mode_test, tag=Plan-5b7090a4]
│ └── Value[type=int2, tag=Plan-7097c826]
└── Query[tag=Plan-4028b20a]
└── Error[problem=-1:-1:<unknown>: Unknown function `not_a_function(<missing>), tag=Plan-70e40d7a, static_type=any]
└── Unresolved[tag=Plan-2c91e120]
├── Symbol[caseSensitivity=INSENSITIVE, symbol=not_a_function, tag=Plan-2e518174]
└── Missing[problem=-1:-1:<unknown>: Expression always returns null or missing: caused by Path Navigation failed - root is not a struct, tag=Plan-aa832a34, static_type=missing]
└── Symbol[key=no_exist_field, tag=Plan-2ba2fcac]
└── Global[tag=Plan-ecd64321, static_type=int2]
└── Ref[catalog=0, symbol=0, tag=Plan-d4929e03]
Problem:
-1:-1:<unknown>: Expression always returns null or missing: caused by Path Navigation failed - root is not a struct
-1:-1:<unknown>: Unknown function `not_a_function(<missing>)
In my opinion the plan is much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each plan node has a unique tag
. I suggest we return error nodes only, and any details are in a Problem
which which is associated with the error node via the tag.
This would mean you don't need to stuff a bunch of debug stuff in the plan. And then we can expand upon the errors without breaking the plan API. Does this make sense?
@@ -61,15 +61,15 @@ internal abstract class PathResolver<T>( | |||
when (n) { | |||
0 -> return null | |||
1 -> return get(absPath) | |||
2 -> return get(absPath) ?: get(path) | |||
2 -> return get(absPath) ?: get(path) ?: search(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment above,
* 1. If N = 1
* (a) Lookup at <catalog>.<schema>.<path> (relative in catalog)
* 2. If N = 2
* (a) Lookup at <catalog>.<schema>.<path> (relative in catalog)
* (b) Lookup at <catalog>.<path> (absolute in catalog)
* 3. If N > 2
* (a) Lookup at <catalog>.<schema>.<path> (relative in catalog)
* (b) Lookup at <catalog>.<path> (absolute in catalog)
* (c) Lookup as an absolute where the first step is the catalog. (absolute in system)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. If N =2 we only check absPath then path. I think absPath and path should be given better names IMO.
Also, now N=2 and N=3 are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I could use a little help.
Could you please elaborate the modeling here?
Case 1:
inserting an obj (t
) into catalog (test
).
session does not specify current directory.
search for t
In this case, the resolution succeeds.
This is expected as it should search for <catalog>.<path>
Case 2:
inserting an obj (t
) into catalog (test
).
session does not specify current directory.
search for test.t
In this case, the resolution failed.
This is unexpected as it should search for <catalog>.<path>
Case 3:
inserting an obj (t
) into catalog (test
).
session specify current directory to be dir
search for t
In this case, the resolution failed.
Not sure if this is expected. Based on the comment, it says it should search for <catalog>.<schema>.<path>
. But why this should not search for <catalog>.<path>
?
Case 4:
inserting an obj (t
) into catalog (test
).
session specify current directory to be dir
search for test.t
In this case, the resolution failed.
This is unexpected as it should search for <catalog>.<path>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original look-up logic extends SQL's three level structure, but preserve the fact that we can not create a table descriptor under catalog. This is a valid assumption.
I rolled back the lookup logic and modified the test.
partiql-planner/src/main/kotlin/org/partiql/planner/internal/PlanningProblemDetails.kt
Show resolved
Hide resolved
partiql-planner/src/main/kotlin/org/partiql/planner/internal/ProblemGenerator.kt
Show resolved
Hide resolved
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rel/RelOffset.kt
Show resolved
Hide resolved
// Information that worth to have to help end customer debugging the plan. | ||
debug::[ | ||
call::[ | ||
unresolved::{ | ||
identifier: identifier, | ||
args: list::[rex], | ||
}, | ||
], | ||
problem::[ | ||
missing::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
error::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'm curious to learn more about how/why this is used. I believe attaching more information to the Rel.Op.Err is a better path forward.
/** | ||
* Determine the planner behavior upon encounter an operation that always returns MISSING. | ||
* In both mode, The problometic operation will be tracked in problem callback. | ||
* Subsequence opearation will take in MISSING as input. | ||
*/ | ||
public enum class MissingOpBehavior { | ||
/** | ||
* The problometic operation will be tracked in problem callback as a error. | ||
* The result plan will turn the problemetic operation into an error node. | ||
*/ | ||
QUIET, | ||
/** | ||
* The problometic operation will be tracked in problem callback as a error. | ||
* The result plan will turn the problemetic operation into an missing node. | ||
*/ | ||
SIGNAL | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch "problometic" to "problematic".
I suggest we have an enum called Flag
and the planner has a set of "flags: Set" because we can extend that in the future. Internally we can represent these flags however we want, and the builder can assert that conflicting flags don't appear in the set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
For starter, I think this mode is really a session property then a planner property.
Also, I am not sure if having such flag would be better in terms of ergonomic for our end users.
val session = PartiQLPlannerSession.builder()
.flag(setOf("QUIET", ....))
.otherBuilderFunc()
.build()
vs
val session = PartiQLPlannerSession.builder()
.missingOpBehavior(QUIET)
.otherBuilderFunc()
.build()
Extending on this should not be consider as breaking as long as we have a default value set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the offline discussion, switching to flag.
val typed = typer.resolve(root) | ||
val internal = org.partiql.planner.internal.ir.PartiQLPlan(typed) | ||
|
||
// 4. Assert plan has been resolved — translating to public API | ||
var plan = PlanTransform.transform(internal, onProblem) | ||
var plan = PlanTransform.transform(internal, session.missingOpBehavior, onProblem, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, session.flags
@@ -61,15 +61,15 @@ internal abstract class PathResolver<T>( | |||
when (n) { | |||
0 -> return null | |||
1 -> return get(absPath) | |||
2 -> return get(absPath) ?: get(path) | |||
2 -> return get(absPath) ?: get(path) ?: search(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. If N =2 we only check absPath then path. I think absPath and path should be given better names IMO.
Also, now N=2 and N=3 are the same
test/partiql-tests-runner/src/test/kotlin/org/partiql/runner/executor/EvalExecutor.kt
Outdated
Show resolved
Hide resolved
fdfd9e4
to
c18b044
Compare
// Information that worth to have to help end customer debugging the plan. | ||
debug::[ | ||
call::[ | ||
unresolved::{ | ||
identifier: identifier, | ||
args: list::[rex], | ||
}, | ||
], | ||
problem::[ | ||
missing::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
error::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how attaching this information is useful, but why is this debug in the public API plan? This makes sense for the internal plan, and we can dump these detailed messages during the PlanTransform.
I don't believe we should be increasing the public plan surface with debug information — error node is already a stretch..
partiql-planner/src/main/kotlin/org/partiql/planner/internal/Env.kt
Outdated
Show resolved
Hide resolved
@@ -23,9 +28,9 @@ import org.partiql.value.PartiQLValueExperimental | |||
*/ | |||
internal object PlanTransform { | |||
|
|||
fun transform(node: PartiQLPlan, onProblem: ProblemCallback): org.partiql.plan.PartiQLPlan { | |||
fun transform(node: PartiQLPlan, missingOpBehavior: PartiQLPlanner.Session.MissingOpBehavior, onProblem: ProblemCallback, debugMode: Boolean): org.partiql.plan.PartiQLPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, let's get MissingOpBehavior and DebugMode into a single compiler flags construct. These shouldn't be passed independently like this. All compilation configuration should be contained in a single type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think MissingOpBehavior is a session mode, whereas the Debugging mode vs production mode should be a planner mode.
let's get MissingOpBehavior and DebugMode into a single compiler flags construct. These shouldn't be passed independently like this. All compilation configuration should be contained in a single type
Could you please elaborate on why you think this is a better approach?
is Rex.Op.Var.Global -> error("EXCLUDE only disallows values coming from the input record.") | ||
paths = node.paths.mapNotNull { | ||
val root = when (val root = it.root) { | ||
is Rex.Op.Var.Unresolved -> org.partiql.plan.Rex.Op.Var(-1, -1) // unresolved in `PlanTyper` results in error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should constructor an error and call visitRexOpErr. We don't want this to go unreported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is a dead branch. Given the the resolution happened in typing time. Going to throw an error.
// Information that worth to have to help end customer debugging the plan. | ||
debug::[ | ||
call::[ | ||
unresolved::{ | ||
identifier: identifier, | ||
args: list::[rex], | ||
}, | ||
], | ||
problem::[ | ||
missing::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
error::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is really asking, could we please consolidate any error/debug messaging to the single err node.
@@ -335,7 +342,7 @@ rel::{ | |||
paths: list::[path], | |||
_: [ | |||
path::{ | |||
root: '.rex.op.var', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with @alancai98 on this .. I know that using a rex.op makes it possible to insert an err node. In this case, I think you should be inserting an error node on the unresolved variable rather than a silent var(-1,-1)
The previous approach have the error/missing node to be a wrapper around the original problematic node. Based on the discussion, we don't want to increase the public api surface. So in this PR, we change the error/missing node to contains a message and a trace, which is a list of rex.op. We replace the original problematic node with the error/missing node, and add their children to the trace.
The original look-up logic extends SQL's three level structure, but preserve the fact that we can not create a table descriptor under catalog. This is a valid assumption. Rollback the lookup logic and modify the test catalog structure.
BTW, this should be updated to point to the v1 branch. |
# Conflicts: # partiql-planner/src/main/kotlin/org/partiql/planner/internal/typer/PlanTyper.kt # partiql-spi/src/main/kotlin/org/partiql/spi/connector/sql/builtins/FnEq.kt
traces: list::['.rex.op'] | ||
}, | ||
|
||
// Information that worth to have to help end customer debugging the plan. | ||
debug::[ | ||
call::[ | ||
unresolved::{ | ||
identifier: identifier, | ||
args: list::[rex], | ||
}, | ||
], | ||
problem::[ | ||
missing::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
error::{ | ||
trace: '.rex.op', | ||
problem: string | ||
}, | ||
] | ||
] | ||
missing::{ | ||
message: string, | ||
traces: list::['.rex.op'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make traces causes
/** | ||
* Java style method for setting the planner to signal mode | ||
*/ | ||
public fun signalMode(): PartiQLPlannerBuilder = this.apply { | ||
this.flags.add(PlannerFlag.SIGNAL_MODE) | ||
} | ||
|
||
/** | ||
* Java style method for setting the planner to quite mode | ||
*/ | ||
public fun quiteMode(): PartiQLPlannerBuilder = this.apply { | ||
this.flags.remove(PlannerFlag.SIGNAL_MODE) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only have one method here, as remember we are setting a boolean. The comment should be very clear which is the default.
Something as straightforward as this,
public fun warningsAsErrors(): PartiQLPlannerBuilder = this.apply {
this.flags.add(PlannerFlag.SIGNAL_MODE)
}
We could get more sophisticated if we want to generalize, but we have no need for that now. I don't care about the internal names, but signal/quiet is a bit too vague and associated with values/NaN's whereas gcc uses warn/error lingo in regards to the compiler behavior.
I'm referencing -Werror
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
-Werror
Make all warnings into errors.
* | ||
* The result plan will turn the problematic operation into a missing node. | ||
*/ | ||
SIGNAL_MODE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok and we don't have to change anything because it is internal.
We are introducing two concepts, compiler warnings and a warning for "always missing". What we are calling "signal" is really emit warnings for always missing + treat warnings as errors -> resulting in a throw
Relevant Issues
====== Update ========
The previous approach have the error/missing node to be a wrapper around the original problematic node.
Based on the discussion, we don't want to increase the public api surface.
So in this PR, we change the error/missing node to contains a message and a trace, which is a list of rex.op. We replace the original problematic node with the error/missing node, and add their children to the trace.
=========================
Description
REL
operation were not complete, specifically, the aggregation and exclude operation.The gist of the PR was quite straightforward.
In the internal Plan:
During typing, if we find a type error / an operation that always returns missing -> we put the input rexOp underneath
rexOpMissing
.If on the other hand, the operation is an error, i.e.,
not_a_function(1)
, we put the input rexOp underneath therexOpErr
.During the conversion between the internal plan to the public plan, we added a flag to control the planner behavior, and add all problem in the problem callback.
At the moment: the choice we have are
QUIET
vsSIGNAL
.In
QUIET
mode: We turns the top most RexOpMissing in to a literal missing.In
SIGNAL
mode: We turns the top most RexOpMissing into a ErrorNode.During the process, we still visit the children of the missing node to gather the problems. That said, in some complex queries, it would be nice to trace the missing ops in the plan for the purpose of debugging.
To do so, this PR also included a debugging planner.
If using a debugging compiler:
In
QUIET
mode: upon encoutner aRexOpMissing
node, instead of turning it into a missing literal, we turn it into aRexOpDebugProblemMissing
. TheRexOpDebugProblemMissing
node is a new node we added in that mirrors the internalRexOpMissing
node. Similar node was added forSIGNAL
mode.partiql-planner/src/test/kotlin/org/partiql/planner/PlannerErrorReportingTests.kt can be a good entry point to this review.
Other Information
Updated Unreleased Section in CHANGELOG: [YES/NO]
Any backward-incompatible changes? [YES/NO]
errors for users that are using our public APIs or the entities that have
public
visibility in our code-base. >Any new external dependencies? [YES/NO]
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES/NO]
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.