-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
current raid/mdadm support does not allow enough time for slow devices to come fully online #97
Comments
Thank you for the feedback.
It is an interesting behavior. I would expect that Could you please enable booster debugging with BTW here are the udev rules from Arch for the same even:
It uses But other than that I do not see any delays or extra sync functionality. I wonder if booster is just "too fast" and starts mounting the block device right away, while udev has a lot of delays internally and hence we do not see this problem there.
As of this warning - could you please file a ticket and we proceed its discussion separately. |
Also could you please share configuration for your mdraid array so I'll try to reproduce the issue it at my side? |
Returning back to this log. It looks like booster gets a udev event for Also locally I added a RAID1 integration test with 5 partitions and it works fine. The array is fully assembled first and then kernel sends a udev event that is handled by booster correctly. |
@richard-cms had you chance to get the debug logs from booster? It will help to understand and fix the problem you see. |
Sorry for the delay @anatol I've had some medical issues pop up. I will try to get you the logs with Maybe a dumb question - what is the easiest way for me to redirect the |
I hope everything is OK with you. Booster debug is copied to Here is an example from my dmesg:
|
Okay I have some good news and bad news: I'm no longer able to reproduce the issue, but I've changed my boot process dramatically so I might just be hiding the issue now. My disk setup is now: I am able to boot fully, including automated decryption of the LUKS2 root, using the following
I also have enabled secureboot, and the embedded kernel commandline in the signed efistub is now: My
I will attach the full boot log, with |
For my own education, I can't actually tell how booster is decrypting my LUKS2 partition according to those logs - I see that it attempts to ping my TANG router but gets a no network message, I never see any network come-up messages, I also don't see any mention of the TPM keys, and I definitely didn't put in my password, so I'm wondering what happened :). On the direct issue of the RAID1 previously not coming up in time, I wonder if setting the kernel commandline arguments to wait for the cryptdevice might mask that problem by making it explicitly wait for the appearance of that UUID to attempt mounting. |
It sounds like a timing issue. But to debug the issue I need the logs from the failed use-case. Here are your log entries related to mdraid assembly:
It looks fine. Was you able to boot successfully without* debug enabled as well?
BTW you can use TPM with clevis/booster as well.
Booster has a retry logic for a tang service availability. The connect() might fail because a network is not available or DHCP is not completed yet. So booster prints the error message, sleeps for a while and then tries to connect again. Eventually it is able to reach a tang service and unlocks the block device. |
No! I confirmed a few times that I could not boot unless the There is no visible error that shows up now, just the timeout waiting for root. The dmesg log shows that |
It is interesting... It also gives an opportunity to understand what is going on there. I just pushed a commit to |
I rebuilt booster
|
Some of the dmraid device configurations might exist in non-safe (non-working) mode. An example is an array is constructed incrementally from multiple devices. The first device creates an array that is repored via udev, but this array is not usable, it needs more block devices added to start serving I/O. Instead of starting to use an mdarray at "add" udev device we have to check whether 'mdadm --incremental' actually starts the array. Use 'mdadm --export' to get status of the array and mount it only when MD_STARTED property is set (i.e. array is live). Fixes #97
Thank you for the logs. It looks like there is a short period of time when the raid1 array exists but not fully assembled yet. Instead of relying on udev "add" event booster should use Here is a proposed fix for the problem that has been pushed to |
I have rebuilt based on 3fd9444 (HEAD of Attached is the |
It is great to hear that it fixed this issue! Thank you very much for discovering and debugging this problem @richard-cms! I am going to test the code at my machines and if everything is ok i'll push it to |
I'm using
booster 0.6-1
from the archlinux repositories. My root is a 2-disk RAID1 on relatively slow HDDs. I am not able to use booster to boot because I believe that booster is not waiting for the RAID1 to fully finish assembly before attempting to mount. My system does alternatively work withmkinitcpio
and thesystemd
+mdadm_udev
hooks.When attempting to boot, I get an "unable to read superblock" error. If I wait for the
mount_timeout
to elapse I can manually mount the RAID device, which is already present, without error.An abbreviated dmesg log is here (I can attach full log if requested). Notice that the "unable to read superblock" error occurs moments before the raid device comes fully online:
(I'm also very surprised by the
tainting kernel
line but that seems unrelated)My
/etc/booster.yaml
isI'm using
systemd-boot
and my boot entry is:My
/etc/mdadm.conf
is justThe text was updated successfully, but these errors were encountered: