-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ztest: vdev.c:1746: Assertion `scn->scn_phys.scn_min_txg <= vdev_dtl_min(vd) #2302
Comments
It looks like we have a bug filed for this internally at Delphix. I copied George's analysis to an illumos bug here: https://www.illumos.org/issues/4890 From what I can tell we do not have a fix for this yet. |
An offline or unreadable vdev may trip the following assertion during a resilver scan in ztest: ztest: ../../module/zfs/vdev.c:1746: Assertion `scn->scn_phys.scn_min_txg <= vdev_dtl_min(vd) (0x4e3 <= 0x3)' failed. child died with signal 6 The following analysis is by George Wilson: The current scan is only resilvering a few txgs [70f, 711] but yet this vdev has a min txg of 3. The problem is that this vdev is currently not readable and as a result when the scan that was doing the resilver it actually finished but didn't copy any of the data to this device. Now a second scan comes through and the device is still offline (ie. not readable) so once again this device was did not have any data copied over to it. This time when we check if we should excise the DTLs from this device we determine we should since the scan is for a txg much higher than the max value in this device's dtl range but we end up tripping over this assertion: /* * When a resilver is initiated the scan will assign the * scn_max_txg * value to the highest txg value that exists in all DTLs. If * this * device's max DTL is not part of this scan (i.e. it is not in * the range (scn_min_txg, scn_max_txg] then it is not eligible * for excision. */ if (vdev_dtl_max(vd) <= scn->scn_phys.scn_max_txg) { ASSERT3U(scn->scn_phys.scn_min_txg, <=, vdev_dtl_min(vd)); If the device is not readable than we don't want to ever excise any of its dtls so we should return B_FALSE and not even bother with anything further. References: https://www.illumos.org/issues/4890 Issue openzfs#2302 Signed-off-by: Ned Bass <[email protected]>
I reproduced this locally and took the opportunity to examine the core dump. Contrary to George's analysis, the vdev involved is not offline according to its vdev_t. That said,
My preliminary guess is that ztest attached a vdev when it was running a scrub. That said, the pool where this was triggered has an unusual geometry: |
This might be fixed by #3172. However, we will need to stress test ztest and not see this for an extended period of time before there could be any certainty of that. |
Likely fixed by #4790. |
Fixed by #4790. |
A long standing issue which ztest hits occasionally.
The text was updated successfully, but these errors were encountered: