Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listxattr() may return EFAULT #4150

Closed
nedbass opened this issue Dec 28, 2015 · 9 comments
Closed

listxattr() may return EFAULT #4150

nedbass opened this issue Dec 28, 2015 · 9 comments
Milestone

Comments

@nedbass
Copy link
Contributor

nedbass commented Dec 28, 2015

On a dataset with xattr=sa, storing xattr values with certain lengths causes the xattr data to be stored in a corrupted nvlist on disk. This makes the xattr data inaccessible as system calls such as listxattr() and lgetxattr() return EFAULT. I have traced the return code to the function nvs_xdr_nvpair(). The lengths in the script below were determined through experimentation and reproduce the problem reliably.

#!/bin/sh
randbase64()
{
        dd if=/dev/urandom bs=1 count=$1 2>/dev/null | openssl enc -a -A
}

file=$1
touch $file
setfattr -h -n user.1 -v `randbase64 5000` $file
setfattr -h -n user.2 -v `randbase64 20000` $file
setfattr -h -n user.3 -v `randbase64 20000` $file
setfattr -h -n user.4 -v `randbase64 6000` $file

echo 3 > /proc/sys/vm/drop_caches
strace -e trace=lgetxattr ls -l $file
strace -e trace=listxattr getfattr -m. -d $file

Here is output from an example run of the above script.

$ zfs create -o xattr=sa tank/f
$ ./efault_reproducer.sh /tank/f/z
lgetxattr("/tank/f/z", "security.selinux", 0xd93300, 255) = -1 EFAULT (Bad address)
ls: /tank/f/z: Bad address
-rw-r--r-- 1 root root 0 Dec 28 11:27 /tank/f/z
listxattr("/tank/f/z", (nil), 0)        = -1 EFAULT (Bad address)
getfattr: /tank/f/z: Bad address
@tuxoko
Copy link
Contributor

tuxoko commented Dec 28, 2015

@nedbass
Do you have selinux enabled?
I don't and I can't reproduce this.

@nedbass
Copy link
Contributor Author

nedbass commented Dec 28, 2015

SELinux is disabled on my system as well.

@tuxoko
Copy link
Contributor

tuxoko commented Dec 28, 2015

Actually, I found a different bug with your script.

If I execute this script 2 times in a roll with a same file name.
After the second run, xattr will go into dir.

And I would get this:

lgetxattr("qqqqq", "security.selinux", 0xadeec0, 255) = -1 ENODATA (No data available)
-rw-r--r-- 1 root root 0 Dec 28 15:37 qqqqq
+++ exited with 0 +++
listxattr("qqqqq", NULL, 0)             = 56
listxattr("qqqqq", "user.1\0user.2\0user.3\0user.4\0user"..., 256) = 56
# file: qqqqq
user.1=<base64>
user.1=<base64>
user.2=<base64>
user.2=<base64>
user.3=<base64>
user.3=<base64>
user.4=<base64>
user.4=<base64>

It seems the listxattr logic will see both spill block and dir, and return each entry twice. Or so I guessed.

@nedbass
Copy link
Contributor Author

nedbass commented Dec 29, 2015

It may depend on which kernel is running. I can reproduce the EFAULT error on a RHEL 6.7 VM (ami-5b8a781f on Amazon EC2) with the EPEL packages. But on my Ubuntu 12.04 desktop I get the behavior you see. It could be the same underlying bug. @behlendorf pointed out that zpl_xattr_set_sa() doesn't cleanly handle errors from zfs_sa_set_xattr(), and this could possibly lead to duplicate directory-based xattrs or malformed SA-based xattrs.

@tuxoko
Copy link
Contributor

tuxoko commented Dec 29, 2015

@nedbass
I think my bug is more fundamental than yours.
When set_xattr need to go into dir, whether it's because of the size, or because xattr been set to on. It would never modify the xattrs in sa, causing some weird behaviour.

For example:

$ sudo zfs set xattr=sa pp/fs0
$ touch zzz
$ setfattr -n user.test -v asdf zzz
$ sudo zfs set xattr=on pp/fs0
$ setfattr -x user.test zzz
setfattr: zzz: No such attribute
$ getfattr -d zzz
# file: zzz
user.test="asdf"

nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to disk. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on-disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, which is consistent practice with other callers.

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to disk. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on-disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, which is consistent practice with other callers.

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
@nedbass
Copy link
Contributor Author

nedbass commented Dec 30, 2015

It seems the cached inode survives drop_caches on the Ubuntu kernels, so the getxattr request is serviced from the unpacked nvlist that hangs off the znode. That's why we don't see the problem there until the cached copy is dropped, for example when the filesystem is remounted.

@behlendorf
Copy link
Contributor

@nedbass that makes sense. The reclaim code is slightly different due to the different kernels and is in both bases simply best effort. The heart of the fix looks good to me but I've posted some review comments.

@nedbass
Copy link
Contributor Author

nedbass commented Dec 30, 2015

@tuxoko the fix in #4152 does avoid the duplicate directory-based xattrs if you run the script twice on the same file. But you're right, there are other fundamental problems when you mix xattr modes. These have been previously reported in #3472.

nedbass added a commit to nedbass/zfs that referenced this issue Dec 30, 2015
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Fixes openzfs#4150

Signed-off-by: Ned Bass <[email protected]>
@behlendorf behlendorf added this to the 0.6.5.4 milestone Dec 30, 2015
ryao pushed a commit to ryao/zfs that referenced this issue Jan 4, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#4150
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Jan 8, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#4150
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Jan 12, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#4150
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Jan 17, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#4150
behlendorf pushed a commit that referenced this issue Jan 29, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #4150
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Feb 4, 2016
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#4150
@behlendorf behlendorf modified the milestones: 0.6.5.5, 0.6.5.4 Mar 23, 2016
@fractalram
Copy link

I'm seeing a similar issue here with running gluster atop 0.6.5.6. The issue first manifested itself when we saw a full filesystem on a df even though there were hardly any files there :

[root@gridcell-pri dist_vol]# df -h .
Filesystem Size Used Avail Use% Mounted on
frzpool/normal/dist_vol
16E 16E 0 100% /frzpool/normal/dist_vol

Then an ls -l on that dataset returned this :

[root@gridcell-pri dist_vol]# ls -l /frzpool/normal/dist_vol/
ls: /frzpool/normal/dist_vol/VMware-viclient-all-6.0.0-3016447.exe: Bad address
total 36028796984461430
drwxrwxrwx 2 root integralstor 23 Jun 6 15:18 dist_vol_dir
-rw-rw---- 2 nobody integralstor 358595696 Nov 2 2015 VMware-viclient-all-6.0.0-3016447.exe

[root@gridcell-pri dist_vol]# ls -di VMware-viclient-all-6.0.0-3016447.exe
17 VMware-viclient-all-6.0.0-3016447.exe

[root@gridcell-pri dist_vol]# zdb -dddd frzpool/normal/dist_vol 17
Dataset frzpool/normal/dist_vol [ZPL], ID 65, cr_txg 124, 16.0E, 87 objects, rootbp DVA[0]=<0:a89a6a000:2000> DVA[1]=<0:b01343c000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=3850L/3850P fill=87 cksum=12141d9d1d:6480463b208:1290de10d0d78:269569da2569f2

Object  lvl   iblk   dblk  dsize  lsize   %full  type
    17    3    16K   128K  16.0E   342M  100.00  ZFS plain file
                                    244   bonus  System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED SPILL_BLKPTR
dnode maxblkid: 2735
path    /.glusterfs/9a/67/9a67b620-76be-4d65-be43-d31fac960d8b
uid     99
gid     500
atime   Fri Jun  3 17:46:04 2016
mtime   Mon Nov  2 16:02:00 2015
ctime   Fri Jun  3 17:46:07 2016
crtime  Fri Jun  3 17:46:04 2016
gen 146
mode    100660
size    358595696
parent  19
links   2
pflags  40800000004

So I'm beginning to wonder if its a manifestation of issue https://github.com/zfsonlinux/zfs/issues/2700 or am I completely off track?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants