Support logical plan compilation #2648

waynexia · 2022-05-29T16:42:43Z

Which issue does this PR close?

Closes #.

Rationale for this change

This PR implements logical plan compilation and executes the compiled plan on record batch.

But this implementation only covers a very few types. Only i64 & f64 array's batch and projection plan are supported. It wouldn't be difficult to extend to all primitive types. However, other complex array types and other plans that may require variable output length (filter) and complex algorithms (join, aggr etc) are hard to implement in pure cranelift IR. It would be unavoidable to call into rust code. This is acceptable in my perspective as the overhead can be alleviated or eliminated (via LLVM, I guess).

What changes are included in this PR?

This PR adds serval JIT-related structs. The most important is JITContext and JITExecutionPlan.

JITContext supports compile logical plan into executable JITExecutionPlan. And JITExecutionPlan is a minimal ExecutionPlan which support execute() on RecordBatch.

Now we can do something like

let ctx = JITContext::default();
let exec_plan = ctx.compile_logical_plan(logical_plan);
let output = exec_plan.execute(input_record_batch);

Are there any user-facing changes?

No

Does this PR break compatibility with Ballista?

No

Signed-off-by: Ruihang Xia <[email protected]>

andygrove · 2022-05-29T19:43:23Z

Hi @waynexia and thanks for the contribution. I understand using the JIT to compile expression trees but I don't understand why we would want to compile a logical plan rather than leverage the JIT in the physical plan? Could you explain that some more? The logical plan has the concept of a join but we don't know whether that is a HashJoin, CrossJoin, or SortMergeJoin until we get to the physical plan so I don't see how we could compile anything from the logical plan for joins as one example.

waynexia · 2022-05-30T05:21:55Z

Hi @andygrove, thanks for your review.

My proposal is to leverage JIT in the phase we convert logical plan into physical execution plan (in the PhysicalPlanner). And the compiled JIT plan is one kind of physical plan. This new flow looks like:

       Logical Plan
            │
   ┌────────▼──────────┐
   │ PhysicalPlanner   │
   │ (with JITContext) │
   └───┬────────────┬──┘
       │            │
       ▼            ▼
JITExecPlan     other ExecutionPlan

I suppose in this phase we have enough information on how to "physically" execute a SQL. JIT module should only replace some previous ExecutionPlan with their compiled variant. Like physical projection to JIT projection or physical HashJoin to JIT HashJoin.

And in another perspective, I think it's not easy to compile a physical plan. One main reason is we can translate logical expr to JIT expr, but not physical expr to JIT expr. Logical expr is a sort of AST which is easy to compile, and physical expr is an "opaque" operation.

However this is not infeasible at all. As showed in this paper. One of the idea is to let those operators used to operate on data to generate IR, and then use the IR to operate data. We could achieve something like

// add a new method to this existing trait
trait ExecutionPlan {
    fn jit_compile(&self, jit_ctx: JITContext) {
        // add the logic to JIT Context. E.g. a filter plan:
        jit_ctx.expr(
            // let column_c = column_a + column_b
            // if column_c > 0, materialize column_d to output
        )
    }
}

fn jit_exec(plan: ExecutionPlan, ctx: JITContext) {
    // let every physical plans to register their logic to context
    // and finalize these logic to executable program.
    ctx.compile(plan);
    ctx.execute()
}

I haven't considered and compared these two ways deeply. But in the current stage, I think they only differ on how to structure our implementation. But they may have an influence on the future topic, like minimizing memory footprint or optimizing (on our side) generated code.

Signed-off-by: Ruihang Xia <[email protected]>

waynexia · 2022-05-31T16:37:20Z

Ok I think it is ready for review. The generated code and test preparation are verbose 😢 Maybe there is some way to do it better?

Signed-off-by: Ruihang Xia <[email protected]>

tustvold · 2022-06-01T10:29:53Z

Hi @waynexia I think it would help move this effort forward if we could have some sort of high-level issue with a brief overview of the issues this is seeking to tackle, along with a link to some sort of design document. Recent examples of this might be, #2502 or #2199. This will help provide context for reviewers, and allow discussion of different approaches with all the necessary information available.

On a related note, I wonder if perhaps this could live as a repo in https://github.com/datafusion-contrib/ ?

waynexia · 2022-06-01T16:02:15Z

Appreciate your advice and suggestion @tustvold! I'm drafting the proposal, hope I could put it into discussion this week.

On a related note, I wonder if perhaps this could live as a repo in https://github.com/datafusion-contrib/ ?

I'm not very familiar with this mechanism. Which aspects are different for living in the main repo or as a standalone repo

andygrove · 2022-06-02T14:17:48Z

Appreciate your advice and suggestion @tustvold! I'm drafting the proposal, hope I could put it into discussion this week.

On a related note, I wonder if perhaps this could live as a repo in https://github.com/datafusion-contrib/ ?

I'm not very familiar with this mechanism. Which aspects are different for living in the main repo or as a standalone repo

datafusion-contrib is not under Apache governance so there is more freedom there to move fast when prototyping. You can merge your own PRs for example while you are the only person working on a crate there. You can also release to crates.io without a formal vote. If/when we want to move any code into the official repo then we would go through the usual IP clearance process.

waynexia · 2022-06-05T15:44:57Z

datafusion-contrib is not under Apache governance so there is more freedom there to move fast when prototyping. You can merge your own PRs for example while you are the only person working on a crate there. You can also release to crates.io without a formal vote. If/when we want to move any code into the official repo then we would go through the usual IP clearance process.

This is a good choice, I'm ok with both. What are others' opinions?

And by the way, I have drafted a proposal for JIT. Please let me know what do you think on this too 😃
https://docs.google.com/document/d/1K7gYog2qsmVTw0VKlivlETM9MXK8UVGCGc_X7ykc_xk/edit?usp=sharing

cc @yjshen @Dandandan @alamb @houqp @viirya (sorry for disturbing)

tustvold · 2022-06-06T08:35:40Z

Thank you for writing this up, I've left some comments on the document.

Some high-level thoughts:

datafusion-contrib is probably the right place to experiment with plan-level JIT. I suspect it will need to go through many iterations before it starts to yield discernible performance improvements, and I suspect it will otherwise likely get held up by the limited review capacity on this repo
My suggestion would be to first focus on simpler applications of JIT so that we can all get experience with how the various technologies interplay, before undertaking a more complex effort. Something like Improve Sorting / Merge performance #2427 might be a good starting point. I'm sure @alamb would be more than happy to provide guidance should you want to take this on

alamb · 2022-06-06T11:00:28Z

I agree that plan level JIT is a great idea, thank you @waynexia for writing up the document as well as this PR. I am sorry it took so long to review it.

TLDR:

I think JIT'ing Exprs (JIT-Compile DataFusion Expressions to create RecordBatches #2122) is a required step for fully JIT'ing plans
datafusion-contrib is likely a good place for this kind of work until it is ready -- I already feel like we have several partly done features (JIT exprs, scheduler) in the core. However, given most of this PR is in the jit crate I am not opposed to adding it too,
I would love to see some performance experiments showing the effect of this work

I am not sure if you have seen the following paper, but it gives a good treatment on the various tradeoffs between vectorized and JIT's compilation of query plans and I think it is quite relevant to this discussion: https://db.in.tum.de/~kersten/vectorization_vs_compilation.pdf?lang=de

Here is the canonical plan I think of that benefits from JIT'ing:

            ▲                                      ▲               
            │                                      │               
            │                                      │               
┌───────────────────────┐          ┌──────────────────────────────┐
│     HashAggregate     │          │        Compiled Node         │
│        gby: x         │          │         JIT'ed code:         │
│     agg: SUM(y+5)     │          │                              │
└───────────────────────┘          │    if x > 5 and y != 10:     │
            ▲                      │      hash (x)                │
            │                      │      probe hash table        │
            │                      │      update SUM              │
┌───────────────────────┐          │                              │
│        Filter:        │          └──────────────────────────────┘
│       x > 5 AND       │                          ▲               
│        y != 10        │                          │               
└───────────────────────┘                          │               
            ▲                      ┌──────────────────────────────┐
            │                      │             Scan             │
            │                      └──────────────────────────────┘
┌───────────────────────┐                                          
│         Scan          │                                          
└───────────────────────┘

In this case, the code to filter and update the hashtable are compiled together so that input rows from the data source directly update the hash table without ever leaving registers. I believe this kind of plan can be made super fast and results state of the art in terms of query performance.

The idea of JIT'ing multiple plan nodes together is a necessary part of this for sure, but one of the last ones. The first step is fully JIT'ing the expr evaluation.

I really like the idea of using DataFusion's extensibility model to develop / prototype this approach in a datafusion-contrib or other repo until it is mature enough to bring into the core codebase. Though to be honest, perhaps the same approach could/should be used for the new scheduler @tustvold is working on too 🤔

In terms of the cache invalidation issues, I think #2199 will help the current vectorized approach to minimize the number of times a batch

I will also try to update #2122 with some more specifics

alamb

I prefer if this was in datafusion-contrib but I think in the jit module is OK too

alamb · 2022-06-06T10:59:34Z

datafusion/jit/src/compile.rs

+        let jit_exec_plan = jit_ctx.compile_logical_plan(projection_plan).unwrap();
+
+        // execute
+        let output_batch = jit_exec_plan.execute(input_batch).unwrap();


this is pretty neat

Signed-off-by: Ruihang Xia <[email protected]>

waynexia · 2022-06-06T15:33:27Z

Thank you all for the reviews ❤️

datafusion-contrib is probably the right place to experiment with plan-level JIT

How should I get started in datafusion-contrib, I think I should ask someone permitted to create a repo? And should we move the entire jit sub-crate or just copy it. The row format will still need the JIT utils.

Something like #2427 might be a good starting point.

Looks cool. I'll try to get involved. And I really like @alamb's reason 1!

The idea of JIT'ing multiple plan nodes together is a necessary part of this for sure, but one of the last ones. The first step is fully JIT'ing the expr evaluation.

This makes sense. I shall revisit the task list. That paper also gives some interesting data, like compilation sometimes will miss more branches. And besides the conclusion, it shows (me) a concrete way to measure cost and improvement.

alamb · 2022-06-06T18:05:38Z

I have spent some time thinking about this and I am not as sure JIT'ing expressions other than multi-column row comparisons will really improve performance over Arrow's optimized kernels -- I think as @waynexia has suggested, the only way we will ever really know is to try it

alamb · 2022-06-06T19:07:42Z

Marking as draft so we don't accidentally merge this

waynexia · 2022-06-13T05:30:26Z

Close this. I'll continue under a separate repo, appreciate it again ❤️

waynexia added 2 commits May 30, 2022 00:06

remove lifetime on CodeBlock

6a1de7e

Signed-off-by: Ruihang Xia <[email protected]>

support logical plan compilation

456a89d

Signed-off-by: Ruihang Xia <[email protected]>

github-actions bot added the datafusion Changes in the datafusion crate label May 29, 2022

This comment was marked as outdated.

Sign in to view

waynexia added 3 commits May 31, 2022 23:32

return error on unimplemented part

e771b7a

Signed-off-by: Ruihang Xia <[email protected]>

Merge branch 'master' into jit-patch-2

a939c34

add doc and test

383af74

Signed-off-by: Ruihang Xia <[email protected]>

waynexia marked this pull request as ready for review May 31, 2022 16:37

remove previous stage

9e54f72

Signed-off-by: Ruihang Xia <[email protected]>

waynexia changed the title ~~WIP: Support logical plan compilation~~ Support logical plan compilation May 31, 2022

andygrove removed the datafusion Changes in the datafusion crate label Jun 3, 2022

alamb approved these changes Jun 6, 2022

View reviewed changes

fix test

3be8ed5

Signed-off-by: Ruihang Xia <[email protected]>

alamb marked this pull request as draft June 6, 2022 19:07

waynexia closed this Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support logical plan compilation #2648

Support logical plan compilation #2648

waynexia commented May 29, 2022

This comment was marked as outdated.

andygrove commented May 29, 2022

waynexia commented May 30, 2022

waynexia commented May 31, 2022

tustvold commented Jun 1, 2022

waynexia commented Jun 1, 2022

andygrove commented Jun 2, 2022

waynexia commented Jun 5, 2022

tustvold commented Jun 6, 2022

alamb commented Jun 6, 2022

alamb left a comment

alamb Jun 6, 2022

waynexia commented Jun 6, 2022

alamb commented Jun 6, 2022

alamb commented Jun 6, 2022

waynexia commented Jun 13, 2022

Support logical plan compilation #2648

Support logical plan compilation #2648

Conversation

waynexia commented May 29, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Does this PR break compatibility with Ballista?

This comment was marked as outdated.

andygrove commented May 29, 2022

waynexia commented May 30, 2022

waynexia commented May 31, 2022

tustvold commented Jun 1, 2022

waynexia commented Jun 1, 2022

andygrove commented Jun 2, 2022

waynexia commented Jun 5, 2022

tustvold commented Jun 6, 2022

alamb commented Jun 6, 2022

alamb left a comment

Choose a reason for hiding this comment

alamb Jun 6, 2022

Choose a reason for hiding this comment

waynexia commented Jun 6, 2022

alamb commented Jun 6, 2022

alamb commented Jun 6, 2022

waynexia commented Jun 13, 2022