-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"internal error: out of memory" with parallel zpool import #16405
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
Progress update:
The stack trace at the point where the corruption is detected looks like this:
|
I've found the root cause. It's at Line 798 in 8d4ad5a
zpool_import_props frees the property list, even though other threads might be using it.
Commit d1807f1 introduced a few other const-removing casts, and those are all potential sources of similar bugs. They should be audited. I'm currently testing a patch, and it's working so far. |
asomers
added a commit
to asomers/zfs
that referenced
this issue
Aug 6, 2024
When importing multiple pools, the nvlist of properties given with "-o" is shared amongst the several threads. So no thread should modify it. Previously, in the course of validating the cachefile property, the zpool_valid_proplist function would temporarily modify the value, and then change it back. Now it will operate on a clone of the value. Sponsored by: Axcient Fixes openzfs#16405 Signed-off-by: Alan Somers <[email protected]>
13 tasks
tonyhutter
pushed a commit
to tonyhutter/zfs
that referenced
this issue
Aug 7, 2024
When importing multiple pools, the nvlist of properties given with "-o" is shared amongst the several threads. So no thread should modify it. Previously, in the course of validating the cachefile property, the zpool_valid_proplist function would temporarily modify the value, and then change it back. Now it will operate on a clone of the value. Sponsored by: Axcient Fixes openzfs#16405 Signed-off-by: Alan Somers <[email protected]>
lundman
pushed a commit
to openzfsonwindows/openzfs
that referenced
this issue
Sep 4, 2024
…penzfs#16419) When importing multiple pools, the nvlist of properties given with "-o" is shared amongst the several threads. So no thread should modify it. Previously, in the course of validating the cachefile property, the zpool_valid_proplist function would temporarily modify the value, and then change it back. Now it will operate on a clone of the value. Sponsored by: Axcient Fixes openzfs#16405 Signed-off-by: Alan Somers <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Alexander Motin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information
Describe the problem you're observing
When I try to import multiple large encrypted zpools, I sometimes see the error "internal error: out of memory" and some pools don't get imported.
Describe how to reproduce the problem
Create 4 zpools each composed of a number of 4-disk raidz2 groups, totalling about 200 disks. Use these options:
zpool create -o autoreplace=on -O atime=off -O setuid=off -O checksum=fletcher4 -O secondarycache=metadata -o cachefile=/var/cache/zpool.cache -O encryption=aes-256-gcm -O keyformat=passphrase -O pbkdf2iters=100000
.Then import them with a command like this:
yes <PASSWORD> | zpool import -al
.Include any warning/errors/backtraces from the system logs
None
Analysis
By patching the source, I've determined that the error message is incorrect. This bug has nothing to do with memory. The actual problem is that
nvlist_unpack
returnsEOPNOTSUP
fromzcmd_read_dst_nvlist
in a stack like this:My theory is that the packed nvlist returned by the kernel is getting corrupted somehow. I plan to continue debugging the problem.
The text was updated successfully, but these errors were encountered: