-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction rearchitecture #872
Comments
Idea: Creating a database level This would allow all reads to first bump the pending readers wg, and then to get a read transaction. So what would happen is compaction would get the high watermark. Then it would wait for the database level readers wg, thus ensuring that active reads finish, and any reads about to start would get a watermark >= the compaction tx. Once we have that I think we can already get rid of the other WAL entries. Replay would simply always load the last snapshot, and then replay from the WAL up to that snapshot's tx. @asubiotto wdyt? |
So we would retain the
s/up to/starting from? This idea seems good to me but I'm not sure I would modify the pseudo-locking (snapshots grabbing a write txn and waiting) we just merged with this approach. Ideally we would have some way for compactions and snapshots to not have to block on each other for correctness. By the way, I think technically we do offer snapshot isolation, just not according to our txn ids. A read might see a write that "happened after" in real time, but given that our transactions are single-operation I think that technically is snapshot isolation (maybe even serializable?) since we can just reorder the read in logical time. Maybe the whole rearchitecture starts with removing some of the meaning we've given to txn ids and think about how we'd ideally like to design multi-op transactions. BTW there's a recent blog post about this by phil eaton which is a nice read: https://notes.eatonphil.com/2024-05-16-mvcc.html |
Yes but also I don't know if that's logically possible with compactions destroying transaction information. |
This issue is meant to describe how transactions work in FrostDB today, and how we would like them to work in the future.
It is meant to act as a way to discuss implementation details for how we might re-architect the code to take what we have today and move it towards what we ideally want for transactions.
Today
Transactions are just a logical incrementing number that indicates a rough ordering of when writes came into the system. Today we do not offer a way to have a multi-write transaction; as such there is no possibility for write tearing. What that means is to the end user transactions are effectively meaningless. Our transactions also don't guarantee read snapshot isolation. Meaning that if you begin a read it is possible to have writes that technically newer than when you logically started your read from being returned in that read (but remember that they will be consistent writes returned not partial writes).
This non-isolation can happen in the following way:
Today we also use our tx system to order administration events in the system via the write ahead log (WAL). Some of these events include block rotations to persistent storage, and snapshots taking place. Compactions currently are not recorded in the WAL.
A large amount of the above were not active design decisions, but were things that have come out of the tumultuous nature of the active development of FrostDB. Many of these decisions made sense at the time but as the code has evolved they may no longer serve the original purpose.
Hopes and Dreams
What we'd like to see out of our transaction system is:
The text was updated successfully, but these errors were encountered: