-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to SPA TXG history kstat #2075
Conversation
When transitioning current open TXG into QUIESCE state and opening a new one txg_quiesce() calls gethrtime(): - to mark the birth time of the new TXG - to record the SPA txg history kstat - implicitely inside spa_txg_history_add() These timestamps are practically the same, so that the first one can be used instead of the other two. The only visible difference is that inside spa_txg_history_add() the time spent in kmem_zalloc() will be counted towards the opened TXG. Since at this point the new TXG already exists (tx->tx_open_txg has been already incremented) it is actually a correct accounting. In any case this extra work is only happening when spa_txg_history kstat is activated (i.e. zfs_txg_history > 0) and doesn't affect the normal processing in any way. Signed-off-by: Cyril Plisko <[email protected]>
In several cases when digging into kstats we can found two txgs in SYNC state, e.g. txg birth state nreserved nread nwritten reads writes otime qtime stime 985452 258127184872561 C 0 373948416 2376272384 2853 51060 1831307554 80493 1465283176 985453 258129016180616 C 0 378173440 28793344 2889 2314 90579 15942 1847951403 985454 258129016271523 S 0 0 0 0 0 1847973949 3202410 0 985455 258130864245986 S 0 0 0 0 0 3212588 132062 0 985456 258130867458851 O 0 0 0 0 0 0 0 0 Hovewer only first txg (985454) is really syncing at this moment. The another one (985455) marked as SYNCED when actually it's in post-QUIESCED state and waiting to start sync So, the new TXG_STATE_WAIT_FOR_SYNC state between TXG_STATE_QUIESCED and TXG_STATE_SYNCED should reveal this situation txg birth state nreserved nread nwritten reads writes otime qtime wtime stime 1086896 235261068743969 C 0 163577856 8437248 1248 931 114706 7734 1526974239 27497355 1086897 235262870830801 C 0 280625152 822594048 2141 24549 1301387568 125520 15068 76376497 1086898 235264172219064 S 0 0 0 0 0 763914712 141834 16437 0 1086899 235264936134407 W 0 0 0 0 0 161097 2684 0 0 1086900 235264936296156 O 0 0 0 0 0 0 0 0 0 Signed-off-by: Igor Lvovsky <[email protected]>
@ilvovsky @imp The first one which is clearly just cleanup is a no-brainer, we can get that merged. The second one covers a case we hadn't fully considered when adding this kstat debugging and I think your proposed changes make good sense. A side from one minor style fix that patches look good to me as well. If your happy with them, and they hold up in testing we'll get it in the next tag. I also think your add-on tool looks very useful! Is it complete enough to consider opening a pull request to package it with the rest of the monitoring utilities (arcstat.py, dbufstat.py)? Or does it still need some polish? |
@behlendorf Good I am glad you find our changes sound and useful. As for the izarc - it is a work in progress. It is very usable in its current form. On the other hand we have quite a bit of functionality planned to be added later on. |
@imp Well I can guarantee you that if you open a pull request and we merge it for the 0.6.3 tag it will quickly get a lot of use and I'm sure you'll get some feedback. One question through, what does |
In several cases when digging into kstats we can found two txgs in SYNC state, e.g. txg birth state nreserved nread nwritten ... 985452 258127184872561 C 0 373948416 2376272384 ... 985453 258129016180616 C 0 378173440 28793344 ... 985454 258129016271523 S 0 0 0 ... 985455 258130864245986 S 0 0 0 ... 985456 258130867458851 O 0 0 0 ... However only first txg (985454) is really syncing at this moment. The other one (985455) marked as SYNCED is actually in a post-QUIESCED state and waiting to start sync. So, the new TXG_STATE_WAIT_FOR_SYNC state between TXG_STATE_QUIESCED and TXG_STATE_SYNCED was added to reveal this situation. txg birth state nreserved nread nwritten ... 1086896 235261068743969 C 0 163577856 8437248 ... 1086897 235262870830801 C 0 280625152 822594048 ... 1086898 235264172219064 S 0 0 0 ... 1086899 235264936134407 W 0 0 0 ... 1086900 235264936296156 O 0 0 0 ... Signed-off-by: Igor Lvovsky <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2075
When transitioning current open TXG into QUIESCE state and opening a new one txg_quiesce() calls gethrtime(): - to mark the birth time of the new TXG - to record the SPA txg history kstat - implicitely inside spa_txg_history_add() These timestamps are practically the same, so that the first one can be used instead of the other two. The only visible difference is that inside spa_txg_history_add() the time spent in kmem_zalloc() will be counted towards the opened TXG. Since at this point the new TXG already exists (tx->tx_open_txg has been already incremented) it is actually a correct accounting. In any case this extra work is only happening when spa_txg_history kstat is activated (i.e. zfs_txg_history > 0) and doesn't affect the normal processing in any way. Signed-off-by: Cyril Plisko <[email protected]> Issue #2075
I would like to solicit feedback on the next two commits.
One is minor change to the timestamps usage, - clean up mostly.
Second is more serious one - we have added another state for the TXG lifecycle - WAIT_FOR_SYNC.
While working on adding support for the SPA TXG history for our add-on tool (available at https://github.com/imp/izarc) we noticed that quite frequently there are two TXG simultaneously in S (syncing) state. This is obviously not possible. What happens in reality is that a TXG left QUIESCING state and now waits for the previous TXG that is SYNCING now to finish and free the SYNC slot. However, in this situation the time spent waiting erroneously accounted as SYNCING as well. Since the syncing is usually non-trivial time wise, the time measurements could be quite incorrect.