Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there plans to support merge on read mode #276

Open
wypb opened this issue Dec 16, 2019 · 6 comments
Open

Are there plans to support merge on read mode #276

wypb opened this issue Dec 16, 2019 · 6 comments
Labels
question Questions on how to use Delta Lake

Comments

@wypb
Copy link

wypb commented Dec 16, 2019

As we know, Delta Lake is currently a copy on write mode, this mode is useful for write less and read more scenarios, But it is not very friendly to write more and read less scenarios, So I want to know if there is such a plan to support read on merge mode.

@wypb wypb changed the title Are there plans to support read on merge mode Are there plans to support merge on read mode Dec 19, 2019
@joewiden
Copy link
Contributor

Hello @397090770

Today delta can solve merge on read use cases. In order to accomplish this, we create two tables and one view.
We have a changes table, which is constantly streamed into in append mode.
We have an "optimized" table which periodically takes the changes from the change table and merges them into the "optimized" table. You can use a streaming job with trigger.once to accomplish this trivially.
We then create a view that joins the optimized table with the changes table to get the latest record for each key.

@SreeramGarlapati
Copy link

SreeramGarlapati commented Mar 15, 2021

@joewiden - are you proposing a solution - by layering on top of Delta lake? or are these Delta lake constructs?
It lookslike - these are kinda your constructs, right!?

I think, the solution you proposed CANNOT be a general-purpose solution for MERGE ON READ due to the below reasons:

  • If multiple changes are streamed to the same row in the Changes Table and if each of these changes are modifying different set of columns in a given row - what will be the View query - which can pick the merged record?
  • deltalake doesn't support multi-table transactions. How will you remove the records from Change table and Add the new Rows to Optimized table in a single transaction?

image

I think your solution - caters to a very specific scenario - that you wanted to solve. Pls let me know if that is not the case.

I believe, generally, these intricate data problems are solved at the database (here big data lake) layer - not on the application layer!
This kinda leaves an important scenario - MERGE ON READ - unsolved.
It would be really great to understand - if DeltaLake is invested in evolving the deltaLake into supporting Merge On Read semantics.

@allisonport-db allisonport-db added the question Questions on how to use Delta Lake label Oct 8, 2021
@dohongdayi
Copy link

Any difference in Delta Lake 2.x ?

@nkarpov
Copy link
Collaborator

nkarpov commented Aug 18, 2022

Hi @dohongdayi - no, Delta 2.1 does not yet natively support MOR. The good news is that one of MOR's dependencies, deletion vectors, is on the H2 roadmap here: #1307 if you'd like to push for this feature please add a comment and upvote it :)

@yangyongyongyong
Copy link

四年了 还不支持?

@saryeHaddadi
Copy link

saryeHaddadi commented May 11, 2023

For information, Deletion Vectors for Read operations are supported since Delta Lake 2.3.0 (released April 2023).
DV support for Delete operations might roll out over the next couples of releases: #1367 (comment)
Then design for support in UPDATE/MERGE operations will be laid out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions on how to use Delta Lake
Projects
None yet
Development

No branches or pull requests

8 participants