Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel zpool import hangs #16172

Closed
asomers opened this issue May 7, 2024 · 4 comments
Closed

Parallel zpool import hangs #16172

asomers opened this issue May 7, 2024 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@asomers
Copy link
Contributor

asomers commented May 7, 2024

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 15.0-CURRENT
Kernel Version git cf04a777 (from Apr 23, 2024)
Architecture amd64
OpenZFS Version zfs-2.2.99-456-FreeBSD_g1f940de07

Describe the problem you're observing

When attempting to import six pools in parallel, the zpool import command hangs. It seems to import all pools, but fail to mount datasets for some of them. It has hanged on six attempts so far out of seven. (The good news is that the one time it didn't hang, it ran twice as fast as before the parallel pool import feature.)

Describe how to reproduce the problem

I created six zpools, each consisting of a 4-disk RAIDZ2 array. All disks were 2.5" SAS 1.8TB SEAGATE ST2000NX0263. Then I created 1024 child file systems on each pool. All children are mounted, but they contain no files. To reproduce the hang, I just run zpool import -a.

Include any warning/errors/backtraces from the system logs

There are no errors in any logs.

Here are the stack traces from the hung zpool process:

Thread 1
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0x96a3a70a2a8, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542e12000, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536c0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87178, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0x96a3a70a2a8, cond@entry=0xc0542e87178, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cec7f in tpool_wait (tpool=tpool@entry=0xc0542e87150) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:529
#7  0x0000096211ec5f41 in import_pools (pools=<optimized out>, pools@entry=0xc0549e72000, props=0x0, mntopts=mntopts@entry=0x0, flags=flags@entry=0, orig_name=0x0, new_name=0x0, 
    import=0x96a32008bd0) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3585
#8  0x0000096211eba8e5 in zpool_do_import (argc=0, argv=<optimized out>) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:4090
#9  0x0000096211eb4534 in main (argc=3, argv=0x96a3200a338) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:11621
Thread 2
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc0543422248, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542eacf00, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536a0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87170, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc0543422248, cond@entry=0xc0542e87170, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cef6b in tpool_worker (arg=0xc0542e1a090) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:150
#7  0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542eacf00) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#8  0x0000000000000000 in ?? ()
Thread 3
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc05434221b8, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542eaa500, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536a0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87170, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc05434221b8, cond@entry=0xc0542e87170, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cef6b in tpool_worker (arg=0xc0542e1a0f0) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:150
#7  0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542eaa500) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#8  0x0000000000000000 in ?? ()
Thread 4
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc0543422218, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542eac100, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0565cbb060, mp=0xc0543400508, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc05442523a8, mutex=mutex@entry=0xc0544252390, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc0543422218, cond@entry=0xc05442523a8, mutex=0xf, mutex@entry=0xc0544252390)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cec7f in tpool_wait (tpool=tpool@entry=0xc0544252380) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:529
#7  0x0000096a354b85d6 in zfs_foreach_mountpoint (hdl=0xc0542e3e000, handles=0xc0565c4c540, num_handles=1025, func=0x96a354b8d20 <zfs_mount_one>, data=<optimized out>, 
    data@entry=0x96a4e0c9df0, parallel=parallel@entry=B_TRUE) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libzfs/libzfs_mount.c:1267
#8  0x0000096a354b8b33 in zpool_enable_datasets (zhp=zhp@entry=0xc054425d7c0, mntopts=mntopts@entry=0x0, flags=flags@entry=0)
    at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libzfs/libzfs_mount.c:1301
#9  0x0000096211ec6e6f in do_import (config=<optimized out>, newname=newname@entry=0x0, mntopts=0x0, props=0x0, flags=0)
    at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3445
#10 0x0000096211ec631f in do_import_task (arg=0xc0542e1a120) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3471
#11 0x0000096a371cf0b3 in tpool_worker (arg=0xc0542e1a120) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:199
#12 0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542eac100) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#13 0x0000000000000000 in ?? ()
Thread 5
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc05434220c8, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542e15f00, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536a0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87170, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc05434220c8, cond@entry=0xc0542e87170, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cef6b in tpool_worker (arg=0xc0542e1a150) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:150
#7  0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542e15f00) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#8  0x0000000000000000 in ?? ()
Thread 6
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc05434220f8, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542e16d00, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536a0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87170, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc05434220f8, cond@entry=0xc0542e87170, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cef6b in tpool_worker (arg=0xc0542e1a270) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:150
#7  0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542e16d00) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#8  0x0000000000000000 in ?? ()
Thread 7
#0  _umtx_op_err () at /usr/home/somers/src/freebsd.org/src/lib/libsys/amd64/_umtx_op_err.S:38
#1  0x0000096a3a705f22 in _thr_umtx_timedwait_uint (mtx=0xc0543422278, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=15, shared@entry=0)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_umtx.c:223
#2  0x0000096a3a6fc5a9 in _thr_sleep (curthread=curthread@entry=0xc0542eadd00, clockid=0, abstime=abstime@entry=0x0) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_kern.c:196
#3  0x0000096a3a6f7b57 in cond_wait_user (cvp=0xc0542e536a0, mp=0xc0543400108, abstime=0x0, cancel=1) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:318
#4  cond_wait_common (cond=cond@entry=0xc0542e87170, mutex=mutex@entry=0xc0542e87160, abstime=abstime@entry=0x0, cancel=cancel@entry=1)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:378
#5  0x0000096a3a6f7dd1 in __thr_cond_wait (cond=0xc0543422278, cond@entry=0xc0542e87170, mutex=0xf, mutex@entry=0xc0542e87160)
    at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_cond.c:393
#6  0x0000096a371cef6b in tpool_worker (arg=0xc0542e1a2a0) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/lib/libtpool/thread_pool.c:150
#7  0x0000096a3a6f8cb5 in thread_start (curthread=0xc0542eadd00) at /usr/home/somers/src/freebsd.org/src/lib/libthr/thread/thr_create.c:289
#8  0x0000000000000000 in ?? ()
@asomers asomers added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 7, 2024
@asomers
Copy link
Contributor Author

asomers commented May 7, 2024

@grwilson I plan to work on this bug this week. But to get me started, do you have any good guesses about what could be wrong?

@asomers
Copy link
Contributor Author

asomers commented May 7, 2024

Update: I've identified the root cause of the bug. The process is running into a thread limit. In my test setup, I could conceivably need 3079 threads to fully populate both the thread pool created in import_pools and the thread pools created for each zpool in zfs_foreach_mountpoint via zpool_enable_datasets. But by default, FreeBSD only allows 1500 threads per process. If I raise the limit to 3079, then the parallel pool import finishes successfully.

Digging deeper, I see that the tpool_dispatch function in lib/libtpool/thread_pool.c doesn't correctly handle errors from create_worker. If create_worker fails, then the work request will never get scheduled, yet tpool_dispatch will still return 0. That's probably fine if at least one other worker is currently running, because that worker will eventually finish its current work request and pick up another. But if NO worker is running and a new one cannot be started, then nothing will ever pick up the work request. And unless the caller tries to schedule a new work request, nothing will ever create another worker, either.

As a solution, I suggest:

  • if tpool_dispatch cannot start a single worker, it should execute the work request synchronously instead.
  • We should lower mount_tp_nthr. I suggest that it should become an argument to zfs_foreach_mountpoint.

@amotin
Copy link
Member

amotin commented May 7, 2024

Not too long ago I've hit/fixed similar problem when we created enormous amount threads to read spacemaps, that broke zdb on large FreeBSD systems due to the same inability to create zillions of threads. We should be more careful of number of threads we create.

Direct execution may not work if different threads are used to decouple locks or some other resource.

@asomers
Copy link
Contributor Author

asomers commented May 7, 2024

Direct execution may not work if different threads are used to decouple locks or some other resource.

My only other ideas aren't perfect either:

  • Use a delay loop within tpool_dispatch. But that won't work if no other threads are making progress.
  • Return an error from tpool_dispatch (and modify callers to actually check for errors). That way callers would be able to fallback to direct execution, and they ought to know whether that would work.
  • Use a global variable to forbid the construction of nested thread pools. Then rework the import process so that import_pools and zfs_foreach_mountpoint can share a single thread pool.

Do you favor any of those, @amotin ?

asomers added a commit to asomers/zfs that referenced this issue May 8, 2024
During parallel zpool import, /sbin/zpool will create a separate thread
pool for each pool, used to mount that pool's datasets.  If the total
thread count exceed's the system's limit on threads per process, then
tpool_dispatch may fail.  If it does, directly execute the mount
operation instead.

Sponsored by:	Axcient
Signed-off-by:	Alan Somers <[email protected]>
Fixes openzfs#16172
asomers added a commit to asomers/zfs that referenced this issue May 8, 2024
During parallel zpool import, /sbin/zpool will create a separate thread
pool for each pool, used to mount that pool's datasets.  If the total
thread count exceed's the system's limit on threads per process, then
tpool_dispatch may fail.  If it does, directly execute the mount
operation instead.

Sponsored by:	Axcient
Signed-off-by:	Alan Somers <[email protected]>
Fixes openzfs#16172
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Sep 4, 2024
During parallel zpool import, /sbin/zpool will create a separate thread
pool for each pool, used to mount that pool's datasets.  If the total
thread count exceed's the system's limit on threads per process, then
tpool_dispatch may fail.  If it does, directly execute the mount
operation instead.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes openzfs#16178
Fixes openzfs#16172
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants