Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional APIs for async physical plan evaluation #1382

Merged
merged 12 commits into from
Mar 12, 2024

Conversation

alancai98
Copy link
Member

@alancai98 alancai98 commented Mar 5, 2024

Description

Creates an asynchronous version of the existing physical plan evaluator APIs. This PR differs from the other attempt to make the physical plan evaluator async (main...plan-eval-async-make-statement-eval-async), which had changed the existing synchronous APIs to be async. Performance-wise, we see about a 10-20% drop in performance when using the async APIs on a single query.

This PR chooses to have both APIs to be compatible with semver. Due to the performance drop, we choose to not just wrap the async evaluator w/ a runBlocking call. Including both versions also makes it easier to test the synchronous and asynchronous API performance more easily. The previous synchronous versions have been marked as deprecated and are expected to be removed in the next major version.

For reviewers, I recommend starting to look at

  1. AsyncOperatorTests.kt -- shows the use case for the async evaluator
  2. PartiQLCompilerAsync.kt -- shows the public API for calling the async evaluator

The remaining changes were pretty straightforward (other than what's noted in the self-review comments). Essentially they were

  • Creating an async version (usually w/ Async added to the end of the previous class/interface)
  • Making the functions within the classes/interfaces suspend fun

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES]

  • Any backward-incompatible changes? [NO]

    • No. Previous synchronized physical plan evaluator APIs are the same but have been marked as deprecated. In a future major version, all of these deprecated APIs will be removed.
  • Any new external dependencies? [YES]

    • Kotlin co-routine libraries to enable async
      • partiql-lang
        • implementation dep on org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.0
        • test dep on org.jetbrains.kotlinx:kotlinx-coroutines-test:1.6.0
      • partiql-examples
        • implementation dep on org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.0
        • implementation dep on org.jetbrains.kotlinx:kotlinx-coroutines-jdk8:1.6.0 (for converting coroutine to Java's CompletableFuture)
        • test dep on org.jetbrains.kotlinx:kotlinx-coroutines-test:1.6.0
      • partiql-cli
        • implementation dep on org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.0
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Mar 5, 2024
Copy link

github-actions bot commented Mar 5, 2024

Conformance comparison report

Base (5121093) e6f5d70 +/-
% Passing 92.54% 92.54% 0.00%
✅ Passing 5384 5384 0
❌ Failing 434 434 0
🔶 Ignored 0 0 0
Total Tests 5818 5818 0

Number passing in both: 5384

Number failing in both: 434

Number passing in Base (5121093) but now fail: 0

Number failing in Base (5121093) but now pass: 0

@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 82.65861% with 287 lines in your changes are missing coverage. Please review.

Project coverage is 73.17%. Comparing base (5121093) to head (eea05a0).

Files Patch % Lines
...ang/eval/physical/PhysicalPlanCompilerAsyncImpl.kt 82.91% 86 Missing and 29 partials ⚠️
...rc/main/kotlin/org/partiql/lang/eval/ThunkAsync.kt 63.51% 48 Missing and 6 partials ⚠️
...al/operators/JoinRelationalOperatorFactoryAsync.kt 64.00% 24 Missing and 3 partials ⚠️
...rtiql/lang/compiler/PartiQLCompilerAsyncDefault.kt 82.43% 5 Missing and 8 partials ⚠️
...l/operators/LimitRelationalOperatorFactoryAsync.kt 69.76% 11 Missing and 2 partials ⚠️
...g/partiql/lang/eval/physical/RelationThunkAsync.kt 31.25% 10 Missing and 1 partial ⚠️
...val/physical/PhysicalBexprToThunkConverterAsync.kt 94.07% 2 Missing and 7 partials ⚠️
.../operators/OffsetRelationalOperatorFactoryAsync.kt 82.22% 7 Missing and 1 partial ⚠️
.../org/partiql/lang/eval/physical/window/LagAsync.kt 66.66% 3 Missing and 4 partials ⚠️
...al/operators/UnpivotOperatorFactoryDefaultAsync.kt 76.00% 3 Missing and 3 partials ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1382      +/-   ##
============================================
+ Coverage     72.15%   73.17%   +1.01%     
- Complexity     2095     2393     +298     
============================================
  Files           221      247      +26     
  Lines         15984    17616    +1632     
  Branches       2896     3175     +279     
============================================
+ Hits          11534    12890    +1356     
- Misses         3641     3849     +208     
- Partials        809      877      +68     
Flag Coverage Δ
CLI 11.86% <0.00%> (-0.01%) ⬇️
EXAMPLES 80.07% <80.00%> (-0.21%) ⬇️
LANG 81.04% <82.89%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +52 to +55
// runBlocking {
println("Calling")
someAsyncOp()
// }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, in the synchronous evaluator, this someAsyncOp() call would require a runBlocking call. With the async evaluator, this runBlocking call is no longer necessary.

rows.add(state.registers.clone())
}

val rowWithValues = rows.map { row ->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the async evaluator for Sort, we evaluate the sortKeys before creating the comparator (L46). This is needed because the evaluation needs to occur in a suspend fun.

In the synchronous evaluator, the evaluation took place within the fold of the comparator: https://github.com/partiql/partiql-lang-kotlin/blob/plan-eval-async-keep-sync/partiql-lang/src/main/kotlin/org/partiql/lang/eval/physical/operators/SortOperatorFactoryDefault.kt#L65-L75, which is not a suspend fun, hence the non-trivial refactor.

@@ -0,0 +1,112 @@
/*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the async operator factories are nearly the same as the synchronous versions other than the SortOperator, which needed a slight refactor. The primary changes for the operator factories are the following:

  • deprecate the synchronous operator factories and relevant classes
  • creation of async operator factory with Async suffix (e.g. <some operator>OperatorFactory -> <some operator>OperatorFactoryAsync) that implements RelationExpressionAsync
  • Any reference to ValueExpression or RelationExpression are changed to ValueExpressionAsync and RelationExpressionAsync
  • The evaluate function is now a suspend fun

@@ -71,7 +72,10 @@ class PartiQLCompilerPipelineExample(out: PrintStream) : Example(out) {

print("PartiQL query:", query)
@OptIn(ExperimentalPartiQLCompilerPipeline::class)
val exprValue = when (val result = partiQLCompilerPipeline.compile(query).eval(session)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples using the synchronous physical plan evaluator and the cli were changed to use the async physical plan evaluator.

parser, planner, compiler
);

String query = "SELECT t.name FROM myTable AS t WHERE t.age > 20";

print("PartiQL query:", query);
PartiQLResult result = pipeline.compile(query).eval(session);

// Calling Kotlin coroutines from Java requires some additional libraries from `kotlinx.coroutines.future`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too easy to call Kotlin async functions from Java. This example demonstrates one way to do so by converting the Kotlin async call into a Java CompletableFuture. If there's a need to call the async evaluator from Java in the future, we can add more helper functions to the Kotlin APIs.

/**
* [PartiQLCompilerAsync] is responsible for transforming a [PartiqlPhysical.Plan] into an executable [PartiQLStatementAsync].
*/
@ExperimentalPartiQLCompilerPipeline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the existing experimental annotation that requires an @OptIn.

val currentRegister = env.registers.clone()
val elements: Flow<ExprValue> = flow {
env.load(currentRegister)
val relItr = bexprThunk(env)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thunk evaluation is an async call. In the synchronous code, this call was within a sequence -- https://github.com/partiql/partiql-lang-kotlin/blob/plan-eval-async-keep-sync/partiql-lang/src/main/kotlin/org/partiql/lang/eval/physical/PhysicalPlanCompilerImpl.kt#L335-L341. But since the evaluation is an async call, we need to use a coroutine Flow (similar to a sequence), which allows for an async call within the block.

More on Flows can be found here: https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/-flow/.

@@ -30,15 +32,24 @@ class TestContext {
assertEquals(expectedIon, result.toIonValue(ION))
}

// Executes query on async evaluator
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests in partiql-lang that tested the synchronous physical plan evaluator will also test the async physical plan evaluator. I tried to limit the amount of copied code, where possible, while also making it easy to remove the synchronous tests when we remove the synchronous evaluator in an upcoming major version.

Comment on lines +35 to +37
/** Converts instances of [PartiqlPhysical.Bexpr] to any [T]. A `suspend` version of the physical plan converter
* interface is added since PIG currently does not output async functions.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The physical plan to async thunk converter implements this Converter, which is the same as the PIG-generated Converter but the functions within the interface are all async (i.e. suspend funs). We could consider adding some configurability to PIG to support generating async versions.

* @param TEnv The type of the environment. Generic so that the legacy AST compiler and the new compiler may use
* different types here.
*/
internal typealias ThunkAsync<TEnv> = suspend (TEnv) -> ExprValue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is basically the same as Thunk.kt. Notable changes include

  • Making any functions/lambda/types async (i.e. adding suspend)
  • Converting sequence constructors that make calls to async functions use coroutine Flows

@alancai98 alancai98 marked this pull request as ready for review March 7, 2024 23:25
@alancai98 alancai98 requested a review from dlurton March 7, 2024 23:30
Copy link
Member

@dlurton dlurton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern I have is about the .toList calls in the thunk returned from compileBindingsToValues. We need to decide if we should try to avoid the eagerness now or if I should file a feature request later.

@@ -1,9 +1,10 @@
package org.partiql.examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to rename this class and file to include the Async suffix? The java version of this has the suffix.

Also, what's the plan when the sync pipeline is removed? Will the new classes keep their Async suffix or will the suffix be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Class and file should include the Async suffix. I will change it.


Also, what's the plan when the sync pipeline is removed? Will the new classes keep their Async suffix or will the suffix be removed?

The new classes will keep the Async suffix.

Comment on lines -1889 to -1895
internal val MetaContainer.sourceLocationMeta get() = this[SourceLocationMeta.TAG] as? SourceLocationMeta
internal val MetaContainer.sourceLocationMetaOrUnknown get() = this.sourceLocationMeta ?: UNKNOWN_SOURCE_LOCATION

internal fun StaticType.getTypes() = when (val flattened = this.flatten()) {
is AnyOfType -> flattened.types
else -> listOf(this)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove these top-level internal functions, this is a breaking change from Java. From the Java to Kotlin interop docs:

internal declarations become public in Java. Members of internal classes go through name mangling, to make it harder to accidentally use them from Java and to allow overloading for members with the same signature that don't see each other according to Kotlin rules

So from Java code, internal classes will have some name mangling. But for top-level internal functions, there is no name mangling. For instance, StaticType.getTypes() is directly callable from Java by:

PhysicalPlanCompilerAsyncImplKt.getTypes(StaticType.ANY);

IntelliJ will give an error like "Usage of Kotlin internal declaration from different module " but these are just errors in the IDE that can be disabled. Gradle and the Java compiler can still run the internal Kotlin code.

I'll cut an issue to track if we should allow for these top-level functions (both internal and public).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub issue: #1387

@alancai98 alancai98 merged commit b063e52 into main Mar 12, 2024
10 checks passed
@alancai98 alancai98 deleted the plan-eval-async-keep-sync branch March 12, 2024 23:59
@alancai98
Copy link
Member Author

Ran the Java API compliance checker between this PR and v0.14.3. The only breaking changes were the removal of the internal functions from PhysicalPlanCompilerImpl.kt, as pointed out in #1382 (comment). I'll cut an issue and discuss w/ team regarding what we should do w/ these internal top-level functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants