You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR is the second (of two) major pieces for supporting simple blind
appends. It implements:
1. **new `Transaction` APIs** for appending data to delta tables:
a. `get_write_context()` to get a `WriteContext` to pass to the data
path which includes all information needed to write: `target directory`,
`snapshot schema`, `transformation expression`, and (future: columns to
collect stats on)
b. `add_write_metadata(impl EngineData)` to add metadata about a write
to the transaction along with a new static method
`transaction::get_write_metadata_schema` to provide the expected schema
of this engine data.
c. new machinery in 'commit' method to commit new `Add` actions for each
row of write_metadata from the API above.
2. **new default engine capabilities** for using the default engine to
write parquet data (to append to tables):
a. parquet handler can now `write_parquet_file(EngineData)`
b. usage example in `write.rs` tests for now
3. **new append tests** in the `write.rs` integration test suite
Details and some follow-ups:
- the parquet writing (similar to JSON) currently just buffers
everything into memory before issuing one big PUT. we should make this
smarter: single PUT for small data and MultipartUpload for larger data.
tracking in #418
- schema enforcement is done at the data layer. this means it is up to
the engine to call the expression evaluation and we expect this to fail
if the output schema is incorrect (see `test_append_invalid_schema` in
`write.rs` integration test). we may want to change this in the future
to eagerly error based on the engine providing a schema up front at
metadata time (transaction creation time)
based on #370resolves#390
The text was updated successfully, but these errors were encountered: