-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding slog to another pool's ZVOL will deadlock #1131
Comments
This is probably related to #612 (Sorry, I should have looked harder). |
Actually, one slightly amusing side note is, I performed the following; Solaris 11
(That error when adding slog always happens, but the device appears to be attached anyway) At this point, I dd the slog device to file slog, for binary comparison. Poweroff Solaris, and boot Linux. Linux x64
and.. it is running! \o/ Let's tar something more to it (to hopefully make slog write more records)
and dd in the /dev/zd0 to slog2 this time, some diffs. I am showing the last few lines (before all zeros) in both cases, so it is encouraging that Linux has moved further along. slog : Solaris
slog2 : Linux
Would would suggest the deadlocking problem is actually in the attaching of the device, and (possibly not) a problem with run-time usage. It would also be nice to really confirm it does use the ZVOL for slog. |
I agree. Based on the stack you posted the deadlock appears to be down the |
It might be worth noting that at least on OpenIndiana (Illumos), ZFS code is (or at least was) prone to deadlocks if Pool A resources are used in Pool B. Not sure if that applies here. (http://mail.opensolaris.org/pipermail/zfs-discuss/2010-July/042806.html) |
Excellent, a discussion about this! Personally I do think a pool inside a pool to be a little crazy, but having the slog in a zvol (in a different, presumably, root-pool) to be very similar to 'swap in zvol'. (I actually thought Sun stops you from pool in pool, I will test this in the VM.) Most of the discussions on Sun, and IllumOS appears to be that people are willing to say it could lead to deadlocks, as nobody has really tried it. And yet, there does not appear to be any examples where deadlocks has happened. The closest is the chap who lost his rpool (which means you can't boot anyway), and had no way to import his data pool due to the missing slog device. But this issue has been address with "import -m" option. There is no reason you could not rebuild your rpool, and keep the data on your data pool. So, nobody wants to go out of a limb to say this setup does work, that is ok. Half the reasons as to why having a slice for slog is 'not a big deal' stems from that Solaris has to boot from a slice, on a VTOC label. But this is no longer true for Linux. You can boot from EFI labels. FBSD already lets you boot from raidz. We should celebrate the death of the legacy partitions. Nobody wants to have to carve a slice out for swap . Nobody wants to do the same for slog. Anyway, it is a project to keep me busy :) |
I actually went using this model on an OI test box. When I did tests like hot-removing zil slog, yanking drives out in general I managed to get my pool in a state where I could not remove the zil slog device any more using the zfs tools. (The device was permanently stuck in the pool and could only be offlined but not replaced or removed) So if you have a test box running solaris you might want to try some "yanking experiments" :) In this configuration I had a rpool of 2 SSD drives in a mirror which also housed a zvol for the "data" pool zil. Actual data pool was simply 2 SATA drives in a mirror. |
I do indeed have test machines. Just to confirm, that you made it hang, or needed a reboot doesn't concern me. But you got to a state where you could not import your data pool, using regular zfs tools and reboot imports? I just racked a 30 SSD storage server not in use until Q1, so I will attack that. Although, I would use Sol 11 or Sol 10, since OI is lagging. Hmm guess I could try all, its just cold in the data centre :) |
I think it actually may have imported fine but managing devices in the pool was a pain. I tried it out like 7 months ago so it's not exactly fresh in my mind. If I recall correctly during my testing I did not have problems importing the pool, though sometimes I had to import it with the slog device destroyed from rpool (and therefore discarded the transactions in the zil before reboot). But I clearly remember having problems with the pool management after some yanking experiments in order to test the redundancies :) |
But that sounds like ZFS is doing exactly its job. Never losing data, or stopping you from getting at it (importing pool), no matter how nastily you yank drives. That you might have to "let go of the slog" device sometimes, is also true with slog on a raw SSD (ie, when you simply lose the SSD. - this happens the most at work on storage devices interestingly). Having to do some "juggling" before you can get at your data (ie, rpool/slog has to be available before data import, which is fairly typical, as well as possible performance hit by using slog of ZVOL, is totally worth it. Compared to that moment when you realise you should have made that partition of a different size. At work we dedicate 2 whole SSDs for just the slog, but at home that is not always possible. Anyway, does not sound like anyone is getting pitchforks and tar just yet, so I will dig deeper and see if I can get lucky with the deadlock problem. |
I'm very interested in any results you might get, please report back on your findings :) |
Ok, I have 3 x4540's spare that are to be returned next month, so I will do some yank-tests. Meanwhile, looking at the problem of Linux using log in zvol. Chasing the code down, it ends up in vdev_open_children which creates a thread So that all locks are held by same thread in case they are ZVOL. [1] This code then enters
What is interesting here, is that we deadlock in taskq_destroy(), which should never be called. Naturally, I go check out vdev_uses_zvols
Ah, that could have something to do with it :) I throw in the line
and we test:
Just that easy. Naturally, that is a bit of a hack. As the comment suggests, I should look at stat() and check the major number. If I can figure it out, I will throw a patch your way. Now, going to pools in zvol, which even I think is a little weird, and I thought Solaris stopped, I tested this on Solaris:
Apparently that is also legal! [1] Interestingly, I can not import these pools on Linux, it deadlocks. Also, creating a 'pool inside zvol' with my above patch, also deadlocks. So that part needs extra work. [1] Implying that Sun does in fact support log/pools in ZVOL regardless of how icky that seems. :) |
Please find patch https://github.com/lundman/zfs-master/commit/76e3e875a3af494378465f4134501db492de1c23 for your perusal. I am not entirely sure how to stat() something from inside the kernel (not easy to google for) but lookup_bdev() does appear to work. As for the 'pool in zvol' problem, it will sometimes work, which is amusing.
I believe this closes #1131 |
Oh on the 'pool in zvol' thing. If I just try to create it, it will deadlock. but interestingly, if it was previously just used as a slog device, it works. Could it be something about existing label?
|
@lundman Nice find! I now vaguely remember disabling this way back when. I've marked up your patch if you could rework and repush that would be great. As for the 'pool in zvol' case let's open a new issue and track that use case there. |
@behlendorf That is issue #612. |
@behlendorf Made a comment, waiting to hear how you want to solve the vdev_bdev_mode() declaration issue. Amusingly, using the vdev_bdev_open() route, means you can no longer make a pool in a zvol, it fails cleanly.
Which is better than deadlocking. I might still try to fix this issue, as I have little else to do. |
Here is a revised patch for the initial problem; https://github.com/lundman/zfs-master/commit/a627730cedfcd57c738fe120ca6c55d51a7d156d Note that I am not entirely happy with it myself, and will need to dig deeper. In particular, this situation happens;
So,
I would also like to test using the /dev/mypool/ext notation, but I don't get those in my system. I am guessing it is part of udev rules? |
My first guess is that ZFS has no notion pools depending on other pools. So when importing everything your going to need to be careful to import the pool with the zvol so it's available when you import the data pool. Right now the pools just get imported in the order in which they appear int the cache file.
Right, you need the |
@behlendorf I added another comment to the patch. |
Here is a smaller cleaner patch https://github.com/lundman/zfs-master/commit/364915536ece65d9fa22abd4a08a2153557d4504 Naturally, it would not need the pragma conditional, I left it there until I know which way you prefer. :) |
@lundman Much better, let's go with lookup_bdev() |
https://github.com/lundman/zfs-master/commit/6f033f721685fc44a011b873116888cea286acc6 Cleaned up patch. I would still like to return to the pool in zvol problem at some point, but I will use the #612 issue for that. |
Minor tweaking but merged |
During the original ZoL port the vdev_uses_zvols() function was disabled until it could be properly implemented. This prevented a zpool from use a zvol for its slog device. This patch implements that missing functionality by adding a zvol_is_zvol() function to zvol.c. Given the full path to a device it will lookup the device and verify its major number against the registered zvol major number for the system. If they match we know the device is a zvol. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1131
….47.1 in /lib/liboci_sdk (openzfs#1131) build(deps): bump github.com/oracle/oci-go-sdk/v65 in /lib/liboci_sdk Bumps [github.com/oracle/oci-go-sdk/v65](https://github.com/oracle/oci-go-sdk) from 65.45.0 to 65.47.1. - [Release notes](https://github.com/oracle/oci-go-sdk/releases) - [Changelog](https://github.com/oracle/oci-go-sdk/blob/master/CHANGELOG.md) - [Commits](oracle/oci-go-sdk@v65.45.0...v65.47.1) --- updated-dependencies: - dependency-name: github.com/oracle/oci-go-sdk/v65 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I am creating this ticket, as it is something you can do in Solaris 11, and you can not in ZOL. Then we can argue whether it is something worth fixing, or useful and all that.
In my case, I have a root pool 'rpool' which is on SSD. I have a much large data pool on HDDs, called 'zpool'.
Instead of using legacy partitions to carve out a space for SLOG on the SSD, I created a ZVOL for it:
This is on Solaris 11.
Doing a test setup on ZOL-rc12, this happens;
Which results in
And the same commands on Solaris 11:
The text was updated successfully, but these errors were encountered: