Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs in ztest due to /dev/random entropy exhaustion #7017

Closed
dweeezil opened this issue Jan 8, 2018 · 11 comments
Closed

Hangs in ztest due to /dev/random entropy exhaustion #7017

dweeezil opened this issue Jan 8, 2018 · 11 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case

Comments

@dweeezil
Copy link
Contributor

dweeezil commented Jan 8, 2018

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.04
Linux Kernel 4.13.10 (upstream)
Architecture x86_64
ZFS Version recent master (390d679)
SPL Version recent master (c9821f1)

Describe the problem you're observing

Ztest can hang for long periods of time due to exhaustion of entry in /dev/random caused by frequent creation of encrypted datasets.

Describe how to reproduce the problem

Run ztest a bunch of times.

Include any warning/errors/backtraces from the system logs

(gdb) thread 917
[Switching to thread 917 (Thread 0x7f5720a6c700 (LWP 17104))]
#0  0x00007f57acf3f51d in read () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f57acf3f51d in read () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f57ad496846 in read (__nbytes=7, __buf=0x7f5720a68989, __fd=5) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  random_get_bytes_common (ptr=0x7f5720a68989 "", 
    ptr@entry=0x7f5720a68970 "pi\v\263\"> \311P\333v[\263\245\020\a`\274\354е\020\063П", len=len@entry=32, fd=5) at kernel.c:938
#3  0x00007f57ad49852b in random_get_bytes (
    ptr=ptr@entry=0x7f5720a68970 "pi\v\263\"> \311P\333v[\263\245\020\a`\274\354е\020\063П", len=len@entry=32) at kernel.c:950
#4  0x00007f57ad5ac263 in zio_crypt_key_init (crypt=crypt@entry=5, key=key@entry=0x7f5720a68960) at ../../module/zfs/zio_crypt.c:233
#5  0x00007f57ad4e8bf0 in dsl_crypto_key_create_sync (crypt=5, wkey=<optimized out>, tx=tx@entry=0x7f55e4024330)
    at ../../module/zfs/dsl_crypt.c:2347
#6  0x00007f57ad4e8e3f in dsl_dataset_create_crypt_sync (dsobj=dsobj@entry=96, dd=dd@entry=0x7f55e40255b0, 
    origin=origin@entry=0x212b620, dcp=dcp@entry=0x221a760, tx=tx@entry=0x7f55e4024330) at ../../module/zfs/dsl_crypt.c:1850
#7  0x00007f57ad4d7a15 in dsl_dataset_create_sync_dd (dd=0x7f55e40255b0, origin=0x212b620, origin@entry=0x0, dcp=dcp@entry=0x221a760, 
    flags=flags@entry=0, tx=tx@entry=0x7f55e4024330) at ../../module/zfs/dsl_dataset.c:974
#8  0x00007f57ad4d7cfb in dsl_dataset_create_sync (pdd=<optimized out>, lastname=0x7ffffd2b1be6 "ds_5", origin=origin@entry=0x0, 
    flags=0, cr=0x0, dcp=0x221a760, tx=0x7f55e4024330) at ../../module/zfs/dsl_dataset.c:1026
#9  0x00007f57ad4c0f48 in dmu_objset_create_sync (arg=0x7ffffd2b1b30, tx=0x7f55e4024330) at ../../module/zfs/dmu_objset.c:1100
#10 0x00007f57ad4f5dd2 in dsl_sync_task_sync (dst=0x7ffffd2b1a60, tx=tx@entry=0x7f55e4024330) at ../../module/zfs/dsl_synctask.c:182
#11 0x00007f57ad4eb20b in dsl_pool_sync (dp=dp@entry=0x20f0d00, txg=txg@entry=49) at ../../module/zfs/dsl_pool.c:706
#12 0x00007f57ad50d65a in spa_sync (spa=spa@entry=0x20d85b0, txg=txg@entry=49) at ../../module/zfs/spa.c:6783
#13 0x00007f57ad51ddf0 in txg_sync_thread (arg=0x20f0d00) at ../../module/zfs/txg.c:547
#14 0x00007f57acf366ba in start_thread (arg=0x7f5720a6c700) at pthread_create.c:333
#15 0x00007f57acc6c3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) 
@behlendorf
Copy link
Contributor

For this very reason zloop.sh execs rngd -f -r /dev/urandom in the background while ztest is running to keep the entropy pool full. This is admittedly a bit of a hack but it does effectively prevent the issue. Better solutions welcome.

@behlendorf behlendorf added the Component: Test Suite Indicates an issue with the test framework or a test case label Jan 10, 2018
@dweeezil
Copy link
Contributor Author

@behlendorf I hadn't noticed that, but it doesn't help running ztest manually. Maybe we should just use /dev/urandom in userland.

@behlendorf
Copy link
Contributor

I'm not sure we can get away with that trick here, let's get @tcaputi's thoughts. What do you think can we safely use use random_get_pseudo_bytes() here instead of random_get_bytes(). How random do these values really need to be?

@tcaputi
Copy link
Contributor

tcaputi commented Jan 10, 2018

The interesting thing I noticed is that the kernel's rng wont block under any circumstances. So I have actually wondered for a while if random_get_bytes() should just alias to random_get_pseudo_bytes() in the first place.

@tcaputi
Copy link
Contributor

tcaputi commented Jan 11, 2018

From the archwiki page on haveged:

Unless you have a specific reason to not trust any hardware random number generator on your system, you should try to use them with the rng-tools first and if it turns out not to be enough (or if you do not have a hardware random number generator available), then use Haveged.

@behlendorf
Copy link
Contributor

I've proposed #7036 as a simple fix. I'm reluctant to always have random_get_bytes() use /dev/urandom in user space, but I don't see any issue with making that the default for ztest which is what the proposed change does.

@tcaputi
Copy link
Contributor

tcaputi commented Jan 11, 2018

@ptx0 I am not sure if they have access to hardware RNGs, but rngd (from rng-tools) seems to give it enough entropy to run well.

@behlendorf Should #7036 also remove the rngd requirement and calls to it rfom zloop.sh?

@behlendorf
Copy link
Contributor

@tcaputi I did remove the calls from zloop.sh but it looks like I forget to remove the dependency from the spec file. Let me add that.

@tcaputi
Copy link
Contributor

tcaputi commented Jan 11, 2018

LGTM

prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 17, 2018
For ztest, which is solely for testing, using a pseudo random
is entirely reasonable.  Using /dev/urandom ensures the system
entropy pool doesn't get depleted thus stalling the testing.
This is a particular problem when testing in VMs.

Reviewed-by: Tim Chase <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7017 
Closes openzfs#7036
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jan 18, 2018
For ztest, which is solely for testing, using a pseudo random
is entirely reasonable.  Using /dev/urandom ensures the system
entropy pool doesn't get depleted thus stalling the testing.
This is a particular problem when testing in VMs.

Reviewed-by: Tim Chase <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7017 
Closes openzfs#7036
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Jan 29, 2018
For ztest, which is solely for testing, using a pseudo random
is entirely reasonable.  Using /dev/urandom ensures the system
entropy pool doesn't get depleted thus stalling the testing.
This is a particular problem when testing in VMs.

Reviewed-by: Tim Chase <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7017
Closes openzfs#7036
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
For ztest, which is solely for testing, using a pseudo random
is entirely reasonable.  Using /dev/urandom ensures the system
entropy pool doesn't get depleted thus stalling the testing.
This is a particular problem when testing in VMs.

Reviewed-by: Tim Chase <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7017
Closes openzfs#7036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

4 participants
@behlendorf @dweeezil @tcaputi and others