Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: Cross-platform user namespace xattrs compat #8

Merged
2 commits merged into from
Jun 29, 2021

Conversation

ghost
Copy link

@ghost ghost commented Apr 14, 2021

ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems. On illumos, xattrs do not
have namespaces. Every xattr name is visible. FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state. These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS. The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms. These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.

Linux has several xattr namespaces. For Linux, ZFS encodes the namespace
in the xattr name for every namespace, including the user namespace. As a
consequence, an xattr in the user namespace with the name "foo" is stored
by ZFS with the name "user.foo" and therefore appears on FreeBSD and
illumos to have the name "user.foo" rather than "foo". Conversely, none of the
xattrs written on FreeBSD or illumos are accessible on Linux unless the name
happens to be prefixed with one of the Linux xattr namespaces, in which
case the namespace is stripped from the name. This makes xattrs
entirely incompatible between Linux and other platforms.

We want to make the encoding of user namespace xattrs compatible across
platforms. A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux. It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux. Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.

Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.

When setting user namespace xattrs, do not prefix the namespace to the
name. If the xattr is already present with the namespace prefix,
remove it so only the non-prefixed version persists. This ensures
other platforms will be able to read the xattr with the correct name.

Explicitly ignore freebsd:system namespace xattrs.

TODO:

  • If xattrs with the user namespace prefix are already present, they
    will not be automatically fixed. Any existing xattrs must be
    manually rewritten on Linux for the name to be correct on FreeBSD.
    This also means that files may have a mix of incompatible and
    compatible names. Other platforms could strip the Linux user
    namespace prefix from xattr names so they are presented correctly.

  • The newly written xattrs will no longer be visible on previous
    versions of ZFS on Linux. This behavior needs to be made optional
    with a feature flag and possibly a per-dataset property.

  • There is no attempt to handle xattr names that clash with a namespace
    prefix. If you write an xattr named "user.foo" to the user namespace
    on FreeBSD, the "user." prefix will be stripped on Linux. This was
    partially the case already, except now the stripped name will also
    replace the prefixed name when updating the xattr. Likewise, setting
    an xattr to the user namespace using a name with the prefix of
    another namespace may cause the xattr to be manipulated in the other
    namespace. This is potentially a security issue. Such names must be
    forbidden.

  • New tests should be added when the functionality is complete.

  • Documentation will be needed.

@ghost ghost added the WIP Work In Progress label Apr 14, 2021
@ghost ghost requested a review from anodos325 April 14, 2021 14:49
@ghost ghost force-pushed the xattr-compat branch from 5069856 to 79b4e0f Compare April 14, 2021 18:25
@ghost ghost force-pushed the truenas/zfs-2.1-release branch 2 times, most recently from c91f01d to 1ceaf3b Compare April 15, 2021 17:08
@ghost ghost force-pushed the xattr-compat branch 4 times, most recently from d398b05 to c225f95 Compare April 16, 2021 15:44
@ghost ghost force-pushed the truenas/zfs-2.1-release branch from 1ceaf3b to 9207dc5 Compare April 23, 2021 10:06
@ghost ghost force-pushed the xattr-compat branch 2 times, most recently from 0a1b7f6 to 6db8d32 Compare April 23, 2021 21:11
@ghost ghost force-pushed the truenas/zfs-2.1-release branch 2 times, most recently from a4d5b0f to 7af6852 Compare May 11, 2021 20:57
@ghost ghost force-pushed the xattr-compat branch from 6db8d32 to a8d2964 Compare May 12, 2021 20:51
@ghost ghost force-pushed the truenas/zfs-2.1-release branch from 7af6852 to 8e040c7 Compare May 14, 2021 15:21
@ghost ghost force-pushed the xattr-compat branch from a8d2964 to 970ce0e Compare May 14, 2021 15:57
@ghost ghost force-pushed the xattr-compat branch from 970ce0e to 81740a1 Compare May 24, 2021 16:22
@ghost
Copy link
Author

ghost commented May 24, 2021

  • Rebased

@ghost ghost force-pushed the truenas/zfs-2.1-release branch from ce7e6ef to c81ed04 Compare May 27, 2021 18:25
@ghost ghost force-pushed the xattr-compat branch from 81740a1 to c6182ee Compare May 28, 2021 15:46
@ghost ghost force-pushed the truenas/zfs-2.1-release branch from c81ed04 to ea1463c Compare May 28, 2021 18:42
@ghost ghost force-pushed the xattr-compat branch from c6182ee to b9d3a79 Compare May 28, 2021 18:44
@ghost ghost force-pushed the xattr-compat branch from b9d3a79 to 192e58b Compare June 21, 2021 20:09
@ghost ghost removed the WIP Work In Progress label Jun 22, 2021
@ghost ghost marked this pull request as ready for review June 22, 2021 18:04
@ghost ghost requested a review from rick-mesta June 22, 2021 18:04
Copy link

@rick-mesta rick-mesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to make a quip about the abi files having your $HOME directory embedded on them, but then the next commit fixed that right up 😁 . This LGTM 👍🏼

if (error == 0)
error = zfs_listextattr_dir(ap, attrprefix);

boolean_t compat = !!(zfsvfs->z_flags & ZSB_XATTR_COMPAT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this double negation pattern throughout -- I take it this is by design, but why ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a short hand way of converting the bit field to a boolean value, !!(0b1010 & 0b1000) -> !!0b1000 -> !0b0 -> 0b1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some consider those as Linux'isms. Discouraged in FreeBSD code, recommending != 0 instead.

@ghost
Copy link
Author

ghost commented Jun 25, 2021

I'm working on a few last fixups and squashing the commits, then if nobody objects I'll be ready to merge this.

Ryan Moeller added 2 commits June 25, 2021 19:44
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems.  On illumos, xattrs do not
have namespaces.  Every xattr name is visible.  FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM.  The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state.  These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS.  The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms.  These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.

Linux has several xattr namespaces.  On Linux, ZFS encodes the
namespace in the xattr name for every namespace, including the user
namespace.  As a consequence, an xattr in the user namespace with the
name "foo" is stored by ZFS with the name "user.foo" and therefore
appears on FreeBSD and illumos to have the name "user.foo" rather than
"foo".  Conversely, none of the xattrs written on FreeBSD or illumos
are accessible on Linux unless the name happens to be prefixed with one
of the Linux xattr namespaces, in which case the namespace is stripped
from the name.  This makes xattrs entirely incompatible between Linux
and other platforms.

We want to make the encoding of user namespace xattrs compatible across
platforms.  A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux.  It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux.  Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.

Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.

When setting user namespace xattrs, do not prefix the namespace to the
name.  If the xattr is already present with the namespace prefix,
remove it so only the non-prefixed version persists.  This ensures
other platforms will be able to read the xattr with the correct name.

Do not allow setting or getting xattrs with a name that is prefixed
with one of the namespace names used by ZFS on supported platforms.

Make xattr namespace compatibility dependent on a new feature.  New
pools will use the compatible namespace encoding by default, and
existing pools will continue using the old Linux-specific encoding
until the feature is enabled.

Allow choosing between cross-platform compatability and legacy Linux
compatibility with a per-dataset property.  This facilitates
replication between hosts with different compatibility needs.

TODO:

* New tests should be added.
* Performance optimizations should be investigated.

Signed-off-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
@ghost ghost force-pushed the xattr-compat branch from d9330a4 to 5283f7a Compare June 26, 2021 00:32
@ghost ghost merged commit 1cd6705 into truenas/zfs-2.1-release Jun 29, 2021
@ghost ghost deleted the xattr-compat branch June 29, 2021 17:31
ixhamza pushed a commit that referenced this pull request Mar 20, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Rob Wing <[email protected]>
Co-authored-by: Rob Wing <[email protected]>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
ixhamza pushed a commit that referenced this pull request Jun 12, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Rob Wing <[email protected]>
Co-authored-by: Rob Wing <[email protected]>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants