-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
making sure the last quiesced txg is synced #8239
Conversation
Fixed a potential bug as described in openzfs#8233: Consider this scenario (see [txg.c](https://github.com/zfsonlinux/zfs/blob/06f3fc2a4b097545259935d54634c5c6f49ed20f/module/zfs/txg.c) ): There is heavy write load when the pool exports. After `txg_sync_stop`'s call of `txg_wait_synced` returns, many more txgs get processed, but right before` txg_sync_stop` gets `tx_sync_lock`, the following happens: - `txg_sync_thread` begins waiting on `tx_sync_more_cv`. - `txg_quiesce_thread` gets done with `txg_quiesce(dp, txg)`. - `txg_sync_stop` gets `tx_sync_lock` first, calls `cv_broadcast`s with `tx_exiting` == 1, and waits for exits. - `txg_sync_thread` wakes up first and exits. - Finally, `txg_quiesce_thread` gets `tx_sync_lock`, and calls `cv_broadcast(&tx->tx_sync_more_cv)`, but `txg_sync_thread` is already gone, and the txg in `txg_quiesce(dp, txg)` above never gets synced. Signed-off-by: Leap Second <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #8239 +/- ##
==========================================
- Coverage 78.57% 78.45% -0.12%
==========================================
Files 379 379
Lines 114924 114927 +3
==========================================
- Hits 90299 90166 -133
- Misses 24625 24761 +136
Continue to review full report at Codecov.
|
Fixed checkstyle complaints: ./module/zfs/txg.c: 558: line > 80 characters ./module/zfs/txg.c: 562: line > 80 characters Signed-off-by: Leap Second <[email protected]>
Addressed checkstype complaints: ./module/zfs/txg.c: 559: continuation should be indented 4 spaces ./module/zfs/txg.c: 564: continuation should be indented 4 spaces Signed-off-by: Leap Second <[email protected]>
Addressed checkstyle complaints: ./module/zfs/txg.c: 559: spaces instead of tabs ./module/zfs/txg.c: 559: continuation should be indented 4 spaces ./module/zfs/txg.c: 564: spaces instead of tabs ./module/zfs/txg.c: 564: continuation should be indented 4 spaces Signed-off-by: Leap Second <[email protected]>
Kernel.org Built-in x86_64 (BUILD) keeps failing with the following error in the make log:
Any thoughts why? EDIT:
|
Since this happened to my fork instead of the master, I am not sure about starting a new issue. I was making trivial edits to please checkstyle (hope I didn't rival others on a early saturday morning ;) ). After three successful runs on
Here are some snippets from relevant logs: Last few lines from 7.2.tests of the failed run:
The last fews lines from 7.3.log:
And the relevant part in 7.4.console:
Looks like @behlendorf can you take a look when you got time? |
@seekfirstleapsecond thanks for opening the PR and looking in to the failures. Don't worry about the kernel.org failures, they appear to be due to recent changes to an unreleased kernel and will need to be investigated independently. As for that last |
Signed-off-by: Leap Second <[email protected]>
Signed-off-by: Leap Second <[email protected]>
ping |
I fail to see why this is a problem. By the time |
Concurrency is certainly not everybody's cup of tea. Conversations between those who get it and those who don't tend to get difficult. Whoever feels qualified to talk about concurrency might wish to look at this and this and self-access their understanding of the arguments and concurrency itself. (Remember Redis?) Regarding this PR, I have nothing to say about its relevance today since it's been a long time since I last seriously looked at ZFS source code. But hopefully those who seriously considered the issue and understood my argument can eventually find it useful in some way. |
@xwcal Shall I consider this an unasked personal attack? And why are those links about locking here?
After |
It's certainly not up to me to restrict your freedom to interpret it according to your instincts, in as much as I have the freedom to interpret this sloppy ending of a five year long collective inquiry according to my instincts. Was there an urge to help improve ZFS or a rush to inflate/deflate certain metrics?
The links are technically very interesting, and also serve as a good example of a conversation on concurrency getting difficult.
What ZFS mechanism ensures that all writing processes get stopped BEFORE Regarding your previous comment:
Of course writes don't always succeed. But a successful write, whether sync or not, should always be persisted in the absence of a crash. You seem to disagree? |
The PR gained no sufficient traction to be reviewed and merged in almost 6 years. Attacking me after I spent time to look on it is counterproductive. I am not paid to listen this. But I am open to hear technical arguments.
I looked it trough. I agree that locks with timeouts are requests for troubles. But I still see no relation to this topic. The only place where ZFS might use something similar is multi-mount protection, but we are not talking about it here
More charades?
By unmounting file systems if it is a live export, and really killing processes if system reboots. And withing ZFS itself:
I said nothing about successful or not. You can not export the pool without unmounting, and unmounting means no new writes. |
Frankly, whether this PR gets merged or not is no longer my concern. What is concerning is the attitude towards a potential problem of data loss. Try as you may to unsay what you said:
which has the obvious interpretation that it's no big deal to lose some data. Sorry but this is unacceptable to me. If a write succeeds, I want it to be persisted in the absence of a crash. Period. Your empirical argument might be correct. But the And there is something called zvol. I'd like to hear your thoughts on that in the context of the current debate. Let me also point out that contrary to what you said, this (complex and arguably low priority) PR has in fact been reviewed quite a few times and got an assignee (not you). Some had no difficulty understanding it. None of the others gratuitously closed it because they "failed to see" the problem. |
Easy to say. However, you seem to have defined "succeeded" as "quiesced", which doesn't make sense to me. As far as OpenZFS is concerned, the transition from open->quiesce->sync is an implementation detail. As a user program, you can only rely on a write being fully committed to disk if you have explicitly taken steps to require it, for example, calling If those "sync write" facilities are what you mean by "succeeded", then you are claiming that OpenZFS is not fulfilling this contract, which is a strong claim and will require evidence that I don't see here. If you meant something softer, like any successful call to (Since you mentioned zvol, block devices have syncing writes available as well. There is no difference there). If we take a requirement for crash-persistence off the table, then we can consider if this patch is worth anything. I am ambivalent. If we can do "just one more sync" at export or other shutdown, without causing problems elsewhere, then maybe it's worth it for four lines of patch. But also, I don't care very much, because any software relying on writes to be on disk without the filesystem explicitly promising that, has a bug. |
@robn I am not going to respond to the other points since:
But I do want to respond to one point you made:
Let's look at the source code.
In other words, "succeeded", assuming Hope it makes some sense now. Of course, if you see something in the source code that tells a different story, please let me know. Also, I never claimed ZFS is losing sync writes (which in fact does happen when ZIL needs to be discarded for some reason). To avoid the straw man, let me repeat my statement above:
|
@xwcal You are probably not smart enough to understand that not attacking everybody around you would make people nicer and more helpful in response. Me and Rob are living on opposite sides of the globe, working for different companies and have no relations outside this project. I haven't responded because I thought that Rob said enough, but apparently not.
OK. Let me tell you a story. You are coming into grocery after closing hour and start bringing more and more goods to the cashier again and again. For some time cashier will be polite and run more and more transactions. But at some point he will excuse himself, close the cash register and go home. After that store guard will allow you safely park your shopping cart, may even allow you visit a restroom. But he won't sell you anything. He either escort you out before closing the door, or call a police if you won't cooperate. So unless it is clear yet, cashier here is sync thread, while quiesce thread is barely a guard who makes sure that customers are getting to cashier in order and then leave. The quiesce thread just makes sure that all threads with transactions assigned to a certain transaction group have committed them and won't cause any churn during the sync process. It does not give any guaranties. As I have told above, And if you are still curios about ZVOLs, then same as file system must be unmounted, ZVOL devices must be closed, or ZFS won't let you export the pool. |
We need to do whatever is necessary to satisfy the API guarantees we make to user applications. If we have not promised that the data is definitely on disk, then the user application has no right to expect that. In those terms, what is your claim? Look at it another way. You've come in aggressively asserting that there is a data loss scenario here. I take that claim seriously. When asked, you do not present anything to support that claim. Your original patch does nothing to rectify any meaningful data lost scenario, because none of these mechanisms have anything to do with userspace durability expectations. Now, if you're not trying to claim a data loss scenario (which you called a strawman above), then you will need to clarify this:
A successful write, by definition, is one where the application has received an explicit guarantee that the data is on disk and will be available after a crash. There is no other way. I'm totally prepared to believe that there's some curious concurrency issue in the tx locking, and we can certainly talk about that. First I need to you to either show that either this issue does lead into an unexpected data loss on the part of a user application, or I need you to take "data loss" off the table. It's a big claim, that we take seriously, and so far you haven't shown that there's an issue, and when challenged, you get combative. I am not interested in that kind of discussion. I will not be commenting here again until and unless more information is provided. |
@amotin I am certainly not smart enough for many things. BUT, I am smart enough to tell when something is too coincidental to warrant suspicion that it is not a mere coincidence. I am smart enough to question why someone clearly uninterested in a technical issue wastes their time hovering over it if they weren't motivated by something non-technical. And I am smart enough not to jump onto a PR assigned to someone else and close it because I can't tell what it's for and I am too lazy to check. Yeah it's been years but I still remember how txg works. The poor grocery store analogy wouldn't help anyone even if it were less emotionally charged. (When the emotion is over, you might gain a better understanding if you read Speaking of "smartness" I feel I am getting dumber as this conversation goes on. I guess I am either getting too much help or too much gaslighting. Either way, I am going to stop after this final comment. As before, you clearly don't intend to provide any proof to back up your empirical argument:
No need to look at 100 functions scattered in 10 files and called from different threads in and outside the kernel space. If you believe something is true it must be true right? (I'd like to believe ZFS has zero bugs too.) BUT if anyone cares, so that an innocent bystander trying to decide who to believe on the advanced things doesn't get more confused about the basic things (who might already be thinking they need to figure out a way to unmount a zvol before exporting their pool), explain the discrepancy between the snippet from man write:
and the statement by @robn:
|
My analogy might be poor, but you are still ignoring what you've being told again and again.
What proof do you want? For what empirical argument? Just open any file on a pool and you won't be able to export it on Linux, or would need to force it on FreeBSD, which allows forceful unmount:
Just open any zvol device and pool export will hang until the device is closed (I consider it actually a bug, especially on FreeBSD, which is able to properly destroy open device, but that is a different topic):
Brian Behlendorf, whom you are quoting, actually told you: "The way txg_sync_stop() is called today when exporting the quiesced txg which might be lost will always be empty." Rob Norris just told you: "Your original patch does nothing to rectify any meaningful data lost scenario". I am repeating to you again and again that by the time this code is called there should be no (valid) pool activity. So please, show any real evidences of some problem or forever hold your peace. I am out of here. |
Motivation and Context
See #8233
Consider this scenario (see txg.c ):
There is heavy write load when the pool exports.
After
txg_sync_stop
's call oftxg_wait_synced
returns, many more txgs get processed, but right beforetxg_sync_stop
getstx_sync_lock
, the following happens:txg_sync_thread
begins waiting ontx_sync_more_cv
.txg_quiesce_thread
gets done withtxg_quiesce(dp, txg)
.txg_sync_stop
getstx_sync_lock
first, callscv_broadcast
s withtx_exiting
== 1, and waits for exits.txg_sync_thread
wakes up first and exits.txg_quiesce_thread
getstx_sync_lock
, and callscv_broadcast(&tx->tx_sync_more_cv)
,but
txg_sync_thread
is already gone, and the txg intxg_quiesce(dp, txg)
above never gets synced.Description
txg_sync_thread
now waits fortxg_quiesce_thread
to exit and maybe run one more sync before exiting.How Has This Been Tested?
Did not test.
Types of changes
Checklist:
Signed-off-by
.