Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed aborts after assertion failure in udev_device_get_sysattr_value #16705

Closed
Uglymotha opened this issue Oct 30, 2024 · 1 comment · Fixed by #16717
Closed

zed aborts after assertion failure in udev_device_get_sysattr_value #16705

Uglymotha opened this issue Oct 30, 2024 · 1 comment · Fixed by #16717
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@Uglymotha
Copy link
Contributor

System information

Distribution Name | custom linux
Distribution Version | n/a
Kernel Version | 6.11.5
Architecture | x86_64
OpenZFS Version | 2.2.6

zed segfaults after assertion failure in udev:
Oct 29 16:57:07 rdsan01 zed[18154]: Assertion 'udev_device' failed at src/libudev/libudev-device.c:742, function udev_device_get_sysattr_value(). Aborting.
Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Main process exited, code=dumped, status=6/ABRT
Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Failed with result 'core-dump'.
Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Scheduled restart job, restart counter is at 7.
Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Start request repeated too quickly.
Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Failed with result 'core-dump'.

Describe how to reproduce the problem

This happens during udev triggering (udevadm trigger -s block).

Include any warning/errors/backtraces from the system logs

Process 30394 (zed) of user 0 dumped core.

Module libcap.so.2 without build-id.
Module libresolv.so.2 without build-id.
Module libkeyutils.so.1 without build-id.
Module libkrb5support.so.0 without build-id.
Module libgmp.so.10 without build-id.
Module ld-linux-x86-64.so.2 without build-id.
Module libuuid.so.1 without build-id.
Module libudev.so.1 without build-id.
Module libz.so.1 without build-id.
Module libgcc_s.so.1 without build-id.
Module libc.so.6 without build-id.
Module libunwind.so.8 without build-id.
Module libcom_err.so.2 without build-id.
Module libk5crypto.so.3 without build-id.
Module libkrb5.so.3 without build-id.
Module libgssapi_krb5.so.2 without build-id.
Module libtirpc.so.3 without build-id.
Module libnvpair.so.3 without build-id.
Module libcrypto.so.3 without build-id.
Module libm.so.6 without build-id.
Module libuutil.so.3 without build-id.
Module libblkid.so.1 without build-id.
Module libzfs_core.so.3 without build-id.
Module libzfs.so.4 without build-id.
Module zed without build-id.
Stack trace of thread 31364:
#0 0x00007f17c40e9e7c __pthread_kill_implementation (libc.so.6 + 0x8de7c)
#1 0x00007f17c409b3b2 raise (libc.so.6 + 0x3f3b2)
#2 0x00007f17c40844ad abort (libc.so.6 + 0x284ad)
#3 0x00007f17c3fca995 log_assert_failed.cold (libudev.so.1 + 0x8995)
#4 0x00007f17c3ff0077 log_assert_failed_return (libudev.so.1 + 0x2e077)
#5 0x00007f17c3fcbc9f udev_device_get_sysattr_value (libudev.so.1 + 0x9c9f)
#6 0x0000561ddc78648e zed_udev_monitor (zed + 0xc48e)
#7 0x00007f17c40e81b2 start_thread (libc.so.6 + 0x8c1b2)
#8 0x00007f17c4162288 __clone3 (libc.so.6 + 0x106288)

Stack trace of thread 30394:
#0 0x00007f17c415dfdb ioctl (libc.so.6 + 0x101fdb)
#1 0x00007f17c4b2ca2c zpool_events_next (libzfs.so.4 + 0x45a2c)
#2 0x0000561ddc786e7b zed_event_service (zed + 0xce7b)
#3 0x0000561ddc784bd8 main (zed + 0xabd8)
#4 0x00007f17c4085d7a __libc_start_call_main (libc.so.6 + 0x29d7a)
#5 0x00007f17c4085e35 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29e35)
#6 0x0000561ddc784561 _start (zed + 0xa561)

Stack trace of thread 31363:
#0 0x00007f17c415dfdb ioctl (libc.so.6 + 0x101fdb)
#1 0x00007f17c4b133dd zpool_refresh_stats (libzfs.so.4 + 0x2c3dd)
#2 0x00007f17c4b26b65 zpool_open_silent (libzfs.so.4 + 0x3fb65)
#3 0x00007f17c4b136d0 zpool_iter (libzfs.so.4 + 0x2c6d0)
#4 0x0000561ddc78d1a1 zfs_slm_event (zed + 0x131a1)
#5 0x0000561ddc78b09b zfs_agent_consumer_thread (zed + 0x1109b)
#6 0x00007f17c40e81b2 start_thread (libc.so.6 + 0x8c1b2)
#7 0x00007f17c4162288 __clone3 (libc.so.6 + 0x106288)
ELF object binary architecture: AMD x86-64
core.zed.0.e9cc196a28654a98a7139ee0d030939f.30394.1730291497000000.zip

@Uglymotha Uglymotha added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 30, 2024
Uglymotha pushed a commit to Uglymotha/zfs that referenced this issue Nov 3, 2024
Fixes: openzfs#16705

Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Changes to be committed:
	modified:   cmd/zed/zed_disk_event.c

Signed-off-by: Sietse <[email protected]>
@Uglymotha
Copy link
Contributor Author

mkdir /tmp/a
cd /tmp/a
xz -dc /boot/ugly-linux-main/initrd-6.11-ugly-linux-main |cpio -di

find . |cpio -H newc -o |xz -T0 --check=crc32 >/boot/ugly-linux-main/initrd-6.11-ugly-linux-main
systemctl reboot

texinfo libltdl-dev tk pp (libperl.so -> aarch64-linux-gnu/libperl.so.5...) gawk lzip build-essential bison flex

Found the culprit, in dev_event_nvlist(struct udev_device dev):
/

* If the device has a parent, then get the parent block
* device's size as well. For example, /dev/sda1's parent
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;

		numval *= strtoull(value, NULL, 10);
		(void) nvlist_add_uint64(nvl, DEV_PARENT_SIZE, numval);
	}
}

In certain cases, like DM-CRYPT-PLAIN devices there is no parent.
if (parent_dev != NULL && (value = udev_device_get_sysattr_value(parent_dev, "size"))
Fixes the issue. I will submit a PR for this.

However from my troubleshooting a new question arises. DM_CRYPT_PLAIN devices seem to behave much like multipath devices. First an add is received for the device, followed by a change with the correct information, see log below. Should this EC_DEV_STATUS be handled as a EC_DEV_ADD just like multipath devices?
Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: 0x7fd050002340, add, /dev/dm-4, disk
Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: /dev/dm-4 no devid source

Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: 0x7fd0500056d0, change, /dev/dm-4, disk
Nov 3 18:04:02 santest zed[2553]: #011class: EC_dev_status
Nov 3 18:04:02 santest zed[2553]: #011subclass: dev_dle
Nov 3 18:04:02 santest zed[2553]: #011dev_name: /dev/dm-4
Nov 3 18:04:02 santest zed[2553]: #011path: /devices/virtual/block/dm-4
Nov 3 18:04:02 santest zed[2553]: #011devid: dm-uuid-CRYPT-PLAIN-storage1
Nov 3 18:04:02 santest zed[2553]: #011phys_path: /dev/disk/by-uuid/3533779146875541629
Nov 3 18:04:02 santest zed[2553]: #011dev_size: 17179869184
Nov 3 18:04:02 santest zed[2553]: #011pool_guid: 3533779146875541629
Nov 3 18:04:02 santest zed[2553]: #011vdev_guid: 11766088279060322789

Uglymotha pushed a commit to Uglymotha/zfs that referenced this issue Nov 4, 2024
Fixes: openzfs#16705

Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Changes to be committed:
	modified:   cmd/zed/zed_disk_event.c

Signed-off-by: Sietse <[email protected]>
Uglymotha pushed a commit to Uglymotha/zfs that referenced this issue Nov 4, 2024
Fixes: openzfs#16705

Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Changes to be committed:
	modified:   cmd/zed/zed_disk_event.c

Signed-off-by: Sietse <[email protected]>
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Nov 5, 2024
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Sietse <[email protected]>
Co-authored-by: Sietse <[email protected]>
Closes openzfs#16705 
Closes openzfs#16717
ixhamza pushed a commit to truenas/zfs that referenced this issue Nov 11, 2024
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Sietse <[email protected]>
Co-authored-by: Sietse <[email protected]>
Closes openzfs#16705 
Closes openzfs#16717
ptr1337 pushed a commit to CachyOS/zfs that referenced this issue Nov 14, 2024
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Sietse <[email protected]>
Co-authored-by: Sietse <[email protected]>
Closes openzfs#16705 
Closes openzfs#16717
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Jan 26, 2025
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Sietse <[email protected]>
Co-authored-by: Sietse <[email protected]>
Closes openzfs#16705 
Closes openzfs#16717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant