-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic on FreeBSD 14-CURRENT w/slog_replay_fs_001 #12163
Comments
Assuming the test failure and this are correlated, it looks like build 651 was the first one on the bot(s) to have those three fail - and it has the aforementioned trace in the syslog. Unfortunately, I haven't seen an obvious way to get the FBSD revision that was running at the time, but since snapshots seem to be weekly, that rather limits the options. Since the "stock" revision of OZFS it came with (2.1-rc1) is from 3/29 and does this, I suspect that it's a change on the FBSD side that resulted in this. edit: I didn't see anything obviously relevant in the commits to FBSD in the last month, so I tried rolling back OpenZFS to 93f81eb (because any further would require me to cherrypick 93f81eb in order to build), to be certain it's probably not a change in OpenZFS - same failure modes. So I guess it's FreeBSD bisect time, unless someone has better insight than me. |
Huh. So I hopped a bit into the past for FBSD (n245865-13b3862ee874, so around 4/6), tried this with the packed zfs.ko (891568c) (which would have been built with DEBUG), I got the aforementioned message and backtrace, but the test passed. Tried with git master (d484a72) and no debug, and a panic popped out. Tried same git rev with debug, it likewise passed slog_replay_fs_001 and printed the aforementioned backtrace. I wonder if it'd be better to build FBSD on the testbots without --enable-debug to catch things like this... |
Based on the panic from the latest OpenZFS master I'd suspect that could have been introduced by #11997. It was only applied to master on 5/13, so it wouldn't have been in the the earlier versions you were testing. This was the The other It'd be great to open PRs for at least for those first two easy test case fixes. cc: @freqlabs @amotin |
Easy enough to test. I'll go open the other two fixes, though I'm not sure if they're just masking some deeper problem...I'll write that down in the PR. edit: #12165 |
Oh right, I tested with 93f81eb which, according to my earlier post, still panics, and that seems to be before 210231e landed. I'll go test with 210231e~1 once my latest build{kernel,world} finishes, to be certain I didn't just take the aforementioned debug backtrace as a proxy and write it down incorrectly, but I don't think I did. |
I just tested with d86debf (which is, unless I'm really bad at git, the commit immediately prior to 210231e), and...still panics! (I explicitly checked the version of the kernel module and userland before running the test, so I'm reasonably confident this is not a false report.) So I'm going to believe my earlier comment that back to 93f81eb still panics, and go back to trying older FBSD kernels. edit to add: n245179-95331c228a39 (3/1 or so) + 5ad86e9 still burns down. (Had to do 5ad86e9 or earlier because apparently git master dies with implicit definition of various vfsops functions after 93f81eb on older FBSD git) |
...huh. 0e9bcd5 + freebsd/freebsd-src@95331c228a39 has this panic, and the commit in 0e9bcd5 is needed to build against the recent system. So I can't really wind further back on that, but it seems like it's been with us for some time, and winding OpenZFS back won't help find it. (If you're wondering why I tried despite my earlier statement - building kernels takes much longer than just winding OpenZFS back, so I figured I'd at least try going back as far as possible.) As for why nobody's run across it - it seems to require you be running a kernel with as much debugging as "GENERIC" (it doesn't reproduce with GENERIC-NODEBUG, which may be expected for FreeBSD regulars, but certainly surprised me), and an OpenZFS module without --enable-debug (which is, I believe, why none of the "stock" zfs modules bundled with FBSD base, or built by the buildbots, ever did this). Is the right thing to do to detect when you're attempting to build against a FBSD kernel with debug bits and force them on/error without --enable-debug passed? I'm not sure; I have no insight into whether this message is spurious, yet, but it feels somewhat hackish, unless FreeBSD outright says you're required to not do that. (If this takes a nontrivial interval, I'll probably open a PR for it anyway, at least for now, so people can't accidentally get burned.) Meanwhile, I'm going to go back to trying older and older kernels. |
Okay, I went all the way back to freebsd/freebsd-src@0f34c80 and it still panicked; I tried freebsd/freebsd-src@b58a463 and it failed to boot from my ZFS root with either the builtin zfs.ko or 5ad86e9, so I decided to stop trying that rather than reinstalling my testbed with UFS root to continue. I examined the code long enough to confirm one theory I had (that it was calling So for now, I'm going to go try bisecting the incorrect behavior of the test case I started this looking into (compiled with --enable-debug --enable-debuginfo, obviously), and maybe I'll come back to this later - it can't be especially important, if nobody else has reported it prior, and it's been present since at least January. |
The namei errors I believe are mismatched INVARIANTS build options. |
How odd - if so, I think something is wrong, because there is configure code to notice this and set things appropriately, and my config.log without --enable-debug --enable-debuginfo reports WITH_INVARIANTS='true' So I'll experiment with this and see what I find. Thanks! edit: So I just tried --enable-invariants, in case it behaved differently than "detect" or I was wrong, and it still panics the same way. So I guess --enable-debug is required, but I'm still curious why it's only breaking on 14-CURRENT. Maybe I'll try my hand at a patch to notice DEBUG... |
Just a hunch, we might need to make sure WITH_INVARIANTS implies WITH_DEBUG through either the configure logic or the Makefile. |
There's already logic to force INVARIANTS on for building if it's present in the running kernel; however, not having DEBUG enabled when DEBUG and INVARIANTS are can cause strange panics. Closes: openzfs#12163 Signed-off-by: Rich Ercolani <[email protected]>
There's already logic to force INVARIANTS on for building if it's present in the running kernel; however, not having DEBUG enabled when DEBUG and INVARIANTS are can cause strange panics. Closes: openzfs#12163 Signed-off-by: Rich Ercolani <[email protected]>
There's already logic to force INVARIANTS on for building if it's present in the running kernel; however, not having DEBUG enabled when DEBUG and INVARIANTS are can cause strange panics. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12185 Closes openzfs#12163
There's already logic to force INVARIANTS on for building if it's present in the running kernel; however, not having DEBUG enabled when DEBUG and INVARIANTS are can cause strange panics. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12185 Closes openzfs#12163
There's already logic to force INVARIANTS on for building if it's present in the running kernel; however, not having DEBUG enabled when DEBUG and INVARIANTS are can cause strange panics. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12185 Closes openzfs#12163
System information
Describe the problem you're observing
I was trying to reproduce the 3 consistent FBSD 14-CURRENT test failures:
The first two were trivial to workaround and I'll have a PR shortly. The third one looked more complicated when I tried it with the "stock" version it shipped with (3522f57), and I noticed it logged the following to syslog but didn't hang the test or the greater system:
So I installed latest OpenZFS git right now, modified loader.conf to have zfs_load="NO" openzfs_load="YES", rebooted, ran the test again...very shortly after launching the test, I noticed my sessions had hung, and the local console had this:
(For later searching purposes:
panic: namei: repeated call to namei without NDREINIT
is the top of it)I managed to save a core dump, and can provide that on request if anyone's interested.
Describe how to reproduce the problem
Run slog_replay_fs_001 on this kernel version and OpenZFS version, apparently.
Include any warning/errors/backtraces from the system logs
See above.
The text was updated successfully, but these errors were encountered: