[Lake][ETL] Improve ETL step such that live + build tables have a unified view to query from #810
Closed
6 tasks done
Labels
Type: Enhancement
New feature or request
Background / motivation
As laid out in PR #794, we can further improve the compute query by:
live_table_name
w/build_table_name
such that queries are able to process across datasets without iterim codeProblem
The issue right now is that all data in ETL is assumed to be in the build table
Whereas in reality, some data might already be in
live_tables
, and needs to be processed againstbuild_tables
.Approach
What we want to do, is to create a view that joins both
live_
andbuild_
tables, such that you can do a single query across them. Here is how to approach this.pseudocode 1 - Create view that unions both tables so all data sits together. Views should not take more storage/mem, it's "an alias for two different tables"
pseudocode 2 - query across both tables through a single view
DoD
_etl
prefix to provide unifying view for bothlive_
and_build
tables_etl
such that we can process all data together under a single query._build
intolive_
and drop the_etl
viewThe text was updated successfully, but these errors were encountered: