JIT-Compile DataFusion Expressions to create `RecordBatches` #2122

Dandandan · 2022-03-29T18:53:22Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We should be able to compile.

The benefit of this is that we can speed up complex / nested expressions by avoiding unnecesarry allocations

Describe the solution you'd like

We should be able to take in a collection of a RecordBatch / named Arrays and compile an expression like (a + b)/ 2 to a loop that results in a new Array.

fn compile(schema: SchemaRef, expr: Expr) ->  CompiledFunction {

}

The loop itself also must be included in the to-be compiled expression, to remove call overhead and allow for possible use of SIMD instructionsinstructions, either explicitly by instrumenting cranelift enough or through auto-vectorization.

Describe alternatives you've considered
n/a

Additional context

alamb · 2022-06-06T12:41:40Z

datafusion PhysicalExpr and arrow-rs library currently evaluate expressions by "materializing intermediate results" -- for example (a + b) + c results in first evaluating (a+b) to a temporary location and then adding c to form the final result.

Note however, there is a tradeoff here between the speed gained using the LLVM optimized vectorized kernels in arrow-rs and cranelift generated JIT expressions where JIT may not actually be faster. I think this is what @Dandandan is referring to when he says "allow for possible use of SIMD instructions, either explicitly by instrumenting cranelift enough or through auto-vectorization."

alamb · 2022-06-06T12:45:02Z

Another example can be found in these slides from this presentation

Dandandan added the enhancement New feature or request label Mar 29, 2022

alamb mentioned this issue May 23, 2022

Evaluate JIT'd expression over arrays #2587

Merged

alamb mentioned this issue Jun 6, 2022

Support logical plan compilation #2648

Closed

alamb changed the title ~~JIT-Compile DataFusion Expressions~~ [EPIC] JIT-Compile DataFusion Expressions Jun 6, 2022

alamb changed the title ~~[EPIC] JIT-Compile DataFusion Expressions~~ JIT-Compile DataFusion Expressions to create RecordBatches Jun 6, 2022

alamb mentioned this issue Jun 6, 2022

[EPIC] JIT support for DataFusion #2703

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT-Compile DataFusion Expressions to create `RecordBatches` #2122

JIT-Compile DataFusion Expressions to create `RecordBatches` #2122

Dandandan commented Mar 29, 2022 •

edited by alamb

Loading

alamb commented Jun 6, 2022

alamb commented Jun 6, 2022

JIT-Compile DataFusion Expressions to create RecordBatches #2122

JIT-Compile DataFusion Expressions to create RecordBatches #2122

Comments

Dandandan commented Mar 29, 2022 • edited by alamb Loading

alamb commented Jun 6, 2022

alamb commented Jun 6, 2022

JIT-Compile DataFusion Expressions to create `RecordBatches` #2122

JIT-Compile DataFusion Expressions to create `RecordBatches` #2122

Dandandan commented Mar 29, 2022 •

edited by alamb

Loading