-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: Cross-platform user namespace xattrs compat #8
Conversation
c91f01d
to
1ceaf3b
Compare
d398b05
to
c225f95
Compare
1ceaf3b
to
9207dc5
Compare
0a1b7f6
to
6db8d32
Compare
a4d5b0f
to
7af6852
Compare
7af6852
to
8e040c7
Compare
|
ce7e6ef
to
c81ed04
Compare
c81ed04
to
ea1463c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to make a quip about the abi files having your $HOME directory embedded on them, but then the next commit fixed that right up 😁 . This LGTM 👍🏼
if (error == 0) | ||
error = zfs_listextattr_dir(ap, attrprefix); | ||
|
||
boolean_t compat = !!(zfsvfs->z_flags & ZSB_XATTR_COMPAT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this double negation pattern throughout -- I take it this is by design, but why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a short hand way of converting the bit field to a boolean value, !!(0b1010 & 0b1000)
-> !!0b1000
-> !0b0
-> 0b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some consider those as Linux'isms. Discouraged in FreeBSD code, recommending != 0 instead.
I'm working on a few last fixups and squashing the commits, then if nobody objects I'll be ready to merge this. |
ZFS on Linux originally implemented xattr namespaces in a way that is incompatible with other operating systems. On illumos, xattrs do not have namespaces. Every xattr name is visible. FreeBSD has two universally defined namespaces: EXTATTR_NAMESPACE_USER and EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected FreeBSD-specific attributes such as MAC labels and pnfs state. These attributes have the namespace string "freebsd:system:" prefixed to the name in the encoding scheme used by ZFS. The user namespace is used for general purpose user attributes and obeys normal access control mechanisms. These attributes have no namespace string prefixed, so xattrs written on illumos are accessible in the user namespace on FreeBSD, and xattrs written to the user namespace on FreeBSD are accessible by the same name on illumos. Linux has several xattr namespaces. On Linux, ZFS encodes the namespace in the xattr name for every namespace, including the user namespace. As a consequence, an xattr in the user namespace with the name "foo" is stored by ZFS with the name "user.foo" and therefore appears on FreeBSD and illumos to have the name "user.foo" rather than "foo". Conversely, none of the xattrs written on FreeBSD or illumos are accessible on Linux unless the name happens to be prefixed with one of the Linux xattr namespaces, in which case the namespace is stripped from the name. This makes xattrs entirely incompatible between Linux and other platforms. We want to make the encoding of user namespace xattrs compatible across platforms. A critical requirement of this compatibility is for xattrs from existing pools from FreeBSD and illumos to be accessible by the same names in the user namespace on Linux. It is also necessary that existing pools with xattrs written by Linux retain access to those xattrs by the same names on Linux. Making user namespace xattrs from Linux accessible by the correct names on other platforms is important. The handling of other namespaces is not required to be consistent. Add a fallback mechanism for listing and getting xattrs to treat xattrs as being in the user namespace if they do not match a known prefix. When setting user namespace xattrs, do not prefix the namespace to the name. If the xattr is already present with the namespace prefix, remove it so only the non-prefixed version persists. This ensures other platforms will be able to read the xattr with the correct name. Do not allow setting or getting xattrs with a name that is prefixed with one of the namespace names used by ZFS on supported platforms. Make xattr namespace compatibility dependent on a new feature. New pools will use the compatible namespace encoding by default, and existing pools will continue using the old Linux-specific encoding until the feature is enabled. Allow choosing between cross-platform compatability and legacy Linux compatibility with a per-dataset property. This facilitates replication between hosts with different compatibility needs. TODO: * New tests should be added. * Performance optimizations should be investigated. Signed-off-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Rob Wing <[email protected]> Co-authored-by: Rob Wing <[email protected]> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Rob Wing <[email protected]> Co-authored-by: Rob Wing <[email protected]> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems. On illumos, xattrs do not
have namespaces. Every xattr name is visible. FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state. These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS. The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms. These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.
Linux has several xattr namespaces. For Linux, ZFS encodes the namespace
in the xattr name for every namespace, including the user namespace. As a
consequence, an xattr in the user namespace with the name "foo" is stored
by ZFS with the name "user.foo" and therefore appears on FreeBSD and
illumos to have the name "user.foo" rather than "foo". Conversely, none of the
xattrs written on FreeBSD or illumos are accessible on Linux unless the name
happens to be prefixed with one of the Linux xattr namespaces, in which
case the namespace is stripped from the name. This makes xattrs
entirely incompatible between Linux and other platforms.
We want to make the encoding of user namespace xattrs compatible across
platforms. A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux. It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux. Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.
Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.
When setting user namespace xattrs, do not prefix the namespace to the
name. If the xattr is already present with the namespace prefix,
remove it so only the non-prefixed version persists. This ensures
other platforms will be able to read the xattr with the correct name.
Explicitly ignore freebsd:system namespace xattrs.
TODO:
If xattrs with the user namespace prefix are already present, they
will not be automatically fixed. Any existing xattrs must be
manually rewritten on Linux for the name to be correct on FreeBSD.
This also means that files may have a mix of incompatible and
compatible names. Other platforms could strip the Linux user
namespace prefix from xattr names so they are presented correctly.
The newly written xattrs will no longer be visible on previous
versions of ZFS on Linux. This behavior needs to be made optional
with a feature flag and possibly a per-dataset property.
There is no attempt to handle xattr names that clash with a namespace
prefix. If you write an xattr named "user.foo" to the user namespace
on FreeBSD, the "user." prefix will be stripped on Linux. This was
partially the case already, except now the stripped name will also
replace the prefixed name when updating the xattr. Likewise, setting
an xattr to the user namespace using a name with the prefix of
another namespace may cause the xattr to be manipulated in the other
namespace. This is potentially a security issue. Such names must be
forbidden.
New tests should be added when the functionality is complete.
Documentation will be needed.