Skip to content
/ toy-df Public

Simplified version of Apache DataFusion, for learning.

Notifications You must be signed in to change notification settings

kanatti/toy-df

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toy DF

A toy version of Apache DataFusion, written from scratch. Purpose is to understand the internals of DataFusion better.

Architecture

                       ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓         
╔═════════════╗        ┃ ┌──────┐ ┌────────────────┐  ┃         
║             ║        ┃ │      │ │                │  ┃         
║  Frontend   ║────────┃ │ SQL  │ │    DataFrame   │  ┃         
║             ║        ┃ │      │ │                │  ┃         
╚═════════════╝        ┃ └──────┘ └────────────────┘  ┃         
       │               ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛         
       │                                                        
       ▼                                                        
╔═════════════╗        ┌────────────────────────────────┐       
║             ║        │  1. Tree based                 │       
║ LogicalPlan ║────────│                                │       
║             ║        │  2. Plan Nodes and Expressions │       
╚═════════════╝        └────────────────────────────────┘       
       │                                                        
       │                                                        
       ▼                                                        
╔═════════════╗                                                 
║  Analyze    ║        ┌────────────────────────────────────────┐
║     +       ║────────│ Rules that tranverse tree and rewrites │
║  Optimize   ║        └────────────────────────────────────────┘
╚═════════════╝                                                 
       │                                                        
       │                                                        
       ▼                                                        
╔═════════════╗                                                 
║             ║                                                 
║PhysicalPlan ║                                                 
║             ║                                                 
╚═════════════╝                                                 
       │                                                        
       │                                                        
       ▼                                                        
╔═════════════╗                                                 
║             ║       ┌───────────────────────────────┐         
║  Runtime    ║───────│ Arrow Operator based runtime  │         
║             ║       └───────────────────────────────┘         
╚═════════════╝                                                 

Plan

As a first milestone, we will try to get basic select + filter work end to end.

  • Expressions
  • DataFrame API
  • LogicalPlan
  • Execution Plan scaffolding
  • TableProvider and InMemory Table
  • CsvTable and ParquetTable
  • Physical Expression
  • Scan execution
  • Projection execution
  • Filter execution
  • Aggregate execution
  • Join Execution
  • Async and Streams

About

Simplified version of Apache DataFusion, for learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages