append transaction data path #390

zachschuermann · 2024-10-11T16:29:36Z

write_context API
expression fixup for physical to logical transform
write_metadata API for new add files

This PR is the second (of two) major pieces for supporting simple blind appends. It implements: 1. **new `Transaction` APIs** for appending data to delta tables: a. `get_write_context()` to get a `WriteContext` to pass to the data path which includes all information needed to write: `target directory`, `snapshot schema`, `transformation expression`, and (future: columns to collect stats on) b. `add_write_metadata(impl EngineData)` to add metadata about a write to the transaction along with a new static method `transaction::get_write_metadata_schema` to provide the expected schema of this engine data. c. new machinery in 'commit' method to commit new `Add` actions for each row of write_metadata from the API above. 2. **new default engine capabilities** for using the default engine to write parquet data (to append to tables): a. parquet handler can now `write_parquet_file(EngineData)` b. usage example in `write.rs` tests for now 3. **new append tests** in the `write.rs` integration test suite Details and some follow-ups: - the parquet writing (similar to JSON) currently just buffers everything into memory before issuing one big PUT. we should make this smarter: single PUT for small data and MultipartUpload for larger data. tracking in #418 - schema enforcement is done at the data layer. this means it is up to the engine to call the expression evaluation and we expect this to fail if the output schema is incorrect (see `test_append_invalid_schema` in `write.rs` integration test). we may want to change this in the future to eagerly error based on the engine providing a schema up front at metadata time (transaction creation time) based on #370 resolves #390

zachschuermann mentioned this issue Oct 11, 2024

[tracking issue] delta kernel write path #377

Open

15 tasks

zachschuermann changed the title ~~data path~~ append transaction data path Oct 11, 2024

zachschuermann self-assigned this Oct 11, 2024

zachschuermann mentioned this issue Oct 14, 2024

[write] Transaction append data API #393

Merged

zachschuermann closed this as completed in #393 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

append transaction data path #390

append transaction data path #390

zachschuermann commented Oct 11, 2024 •

edited

Loading

append transaction data path #390

append transaction data path #390

Comments

zachschuermann commented Oct 11, 2024 • edited Loading

zachschuermann commented Oct 11, 2024 •

edited

Loading