-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neither security xattrs nor posix ACL xattrs are atomically managed #2718
Comments
Regarding the issue of ZIL replay, this shouldn't be an issue with the current design because the xattr-creating functions should create their own replay entries. In other words, this shouldn't actually be a problem as the code sits right now. To that end, I've updated the title of this issue to remove that part. |
Thanks for filing this. It's been something of a subtle long standing issue I'd love to see resolved. I think your approach off adding a callback here to deal with the layering issues is probably the only sane way to handle this. Whenever you're ready I'll be interested to see the patch. |
An update on this issue is in order. This issue could easily have been titled "Linux Posix layer requires additional support from the 'native' Posix layer". The problem at hand is that some operations in the Linux Posix layer perform multiple calls to the "Native" ZFS Posix layer which should occur in a single DMU transaction but which don't because the transactional encapsulation occurs within the Native Posix layer. My initial proposition of a simple vnop-to-zpl callback scheme doesn't help matters much because the support doesn't exist for a vnop function (i.e. zfs_create) to perform xattr creation within its transaction. I think this issue exposes one of many fundamental differences between Solaris and Linux: In Solaris, xattrs are mainly a userland facility wheres Linux has several facilities (selinux and Posix ACLs) which ascribe kernel-level semantics to them (xattrs). Clearly we (Zfsonlinux) want to support both Selinux and Posix ACLs as well as possible; they're both quite important in the Linux ecosystem. We need ZFS to be able to add xattrs to a newly-created object as atomically as possible with respect to the creation of the object. To that end, my current WIP patch adds a new suite of xattr-creating functions which mostly parallel their corresponding functions in zpl_xattr.c. In addition, some of the existing "vnop helper" functions such as My hope was that I could come up with something rather quickly and that it would expose a race condition or something similar which is causing the various cases of corrupted dnodes which seems to be occurring with increasing regularity when SA xattrs are being used to support Posix ACLs. Unfortunately, fixing this the right was will be neither quick nor easy. EDIT: See my next comment. |
I had an idea which came to me after thinking about what a mess it will be to implement the scheme I outlined in my previous comment: Rather than storing these "special" xattrs in the traditional manner, how about leaning on the SA facility and creating new system attributes for them. I'd propose creating a new SA for each of "system.posix_acl_access", "system.posix_acl_default", "security.selinux" and "security.capability". @behlendorf What are your thoughts on this idea? I'm asking because I don't want to plow ahead and implement something along these lines if it's a show-stopper with no chance of being adopted. I'd suggest this feature be controlled by a whole new filesystem property or, maybe a feature flag. If it were to be implemented, we'd likely want to suggest that Some of the advantages to the implementation are:
The disadvantages seem to be similar to those related to
These new system attributes would be manipulated in the vnop functions with the existing SA interface and would be tied into the xattr mechanism via the xattr "handler" mechanism. An extension to this concept would be to store all "security." xattrs either as a single SA or as a series of SAs with names based on the attribute's suffix. This would allow to support SMACK and other security features but (at least in the case of packing all in a single SA) would require additional code to manage the packed value (likely stored as an nvlist). |
@dweeezil I've had similar thoughts about this. What you're proposing is actually more consistent with how SAs were originally designed to be used. It's just a little less flexible since you're tied in to those specific SA names at build time and then we're committed to them forever. It also requires that we add custom code to bypass the normal xattr call paths for these special xattr keys. That said, I think this is still a reasonable approach to pursue for exactly the reasons you mentioned. Most importantly it keeps the interface as clean and allows us to ensure everything ends up in the same transaction. As for the specific downsides:
This is a valid concern but I don't think it should prevent us from doing this. The worst case here is that we have a few SAs which are unknown on the other ports. This is no worse that the current
Very little. I believe that's only around 100 bytes left in the dnode. Perhaps not even that much. If at all possible we should ensure the common case for SELinux contexts do not force us to kick in a spill block. That would be very bad for performance. That said, we should be adding support for dnodes larger than 512 in the next 6-12 months. Perhaps sooner. When this happens it would be reasonable to update the default dnode size to 1k which would be large enough for everything we need and a few user xattrs.
This could be done cleanly if we add a feature flag for this. We should be able to convert either directory or SA xattrs with these keys to the new format as they are encountered. This sort of thing has been done before in the code. For example, old style znodes (v4) are converted to the newer SA format (v5).
All the more reasons to force the conversion above. Frankly this would also provide an opportunity to simplify the
This is unfortunate, but I think it's just a cost we're going to need to pay.
How would this be so different that what we're doing today? We have an exist SA which stores all of the key/value pairs in an nvlist. We just happen to access them through the xattr interfaces. There's nothing technically preventing us from bypassing those interfaces, directly manipulating the nvlist, and packing it in to the TX. The only wrinkle is getting the locking for this correct. I could see how adding a new "security" and probably "system" nvlist namespace would simplify that. And this is all a long way of saying I think this is a pretty reasonable plan to explore. There might be gotchas discovered along the way but in principle this is a reasonable approach. |
@behlendorf It sounds like we're on the same page with this. I did actually go ahead and start implementing it this afternoon and am close to having a working patch for review to deal with the xattr-as-SA part of it. Once that's ironed out, however, fixing the problem for which this issue was created should be fairly trivial. |
For this issue and another similar issue I recall seeing, the ideal
|
I'm very close to submitting a pull request for the "native SA xattr" feature which is the foundation upon which a fix for this issue will be based. Here's a preview of a new file creation on a ZFS file system with selinux enabled (ubuntu trusty with the "ubuntu" selinux policy modified to support zfs):
I've modified And the SA attribution registrations and layouts"
I've run the Posix ACL test suite on this file system and done a bunch of other tests so it has built up a bunch of interesting SA layouts. The main issue I'm wrestling with is how to deal with existing SPL_DXATTR-style xattrs. I'd really like to wipe them but it will add a lot of code. Ideally, I'd like to purge the entire SPL_DXATTR SA if it ever becomes empty (which is something the current code doesn't do). I'm not thrilled with the potential overhead of this type of cleanup. The new code does cause the new native SA xattrs to have precedence over the DXATTR versions similarly to the way the existing xattr=sa code gives them precedence over the directory-style xattrs. |
This patch adds a new class of system atributes which are referred to as "Native SA xattrs". The facilities described here are only enabled when "xattr=sa" is set. If xattr=sa is set, the following "security." and "system." xattrs are stored as native SA xattrs rather than as elements of the ZPL_DXATTR SA: xattr System atrribute -------------------------------------------------------- security.selinux ZPL_SECURITY_SELINUX security.capability ZPL_SECURITY_CAPABILITY system.posix_acl_access ZPL_SYSTEM_POSIX_ACL_ACCESS system.posix_acl_default ZPL_SYSTEM_POSIX_ACL_DEFAULT Storing these xattrs as native system attributes allows for the ZPL to more easily and naturally operate on them as an atomic part of other operations and will be used as the foundation for fixing issue openzfs#2718. Zdb will display these under the new "Native SA xattrs" section. Lookups of these xattrs will use the following priority: 1. Native SA xattr (as shown in the able above) 2. Linux ZPL_DXATTR nvlist 3. Traditional ZFS directory-style xattr Modifications of these xattrs will erase an existing ZPL_DXATTR instance of an identically-named xattrs but will not change an existing instance of an identically-named directory-style xattr. If a modification of the ZPL_DXATTR SA causes it to become empty, the ZPL_DXATTR SA, itself is deleted to maximize available space in the bonus buffer. The effect is that existing ZPL_DXATTR SAs are automatically upgraded as they are changed.
This patch adds a new class of system atributes which are referred to as "Native SA xattrs". The facilities described here are only enabled when "xattr=sa" is set. If xattr=sa is set, the following "security." and "system." xattrs are stored as native SA xattrs rather than as elements of the ZPL_DXATTR SA: xattr System atrribute -------------------------------------------------------- security.selinux ZPL_SECURITY_SELINUX security.capability ZPL_SECURITY_CAPABILITY system.posix_acl_access ZPL_SYSTEM_POSIX_ACL_ACCESS system.posix_acl_default ZPL_SYSTEM_POSIX_ACL_DEFAULT Storing these xattrs as native system attributes allows for the ZPL to more easily and naturally operate on them as an atomic part of other operations and will be used as the foundation for fixing issue openzfs#2718. Zdb will display these under the new "Native SA xattrs" section. Lookups of these xattrs will use the following priority: 1. Native SA xattr (as shown in the able above) 2. Linux ZPL_DXATTR nvlist 3. Traditional ZFS directory-style xattr Modifications of these xattrs will erase an existing ZPL_DXATTR instance of an identically-named xattrs but will not change an existing instance of an identically-named directory-style xattr. If a modification of the ZPL_DXATTR SA causes it to become empty, the ZPL_DXATTR SA, itself is deleted to maximize available space in the bonus buffer. The effect is that existing ZPL_DXATTR SAs are automatically upgraded as they are changed.
This patch adds a new class of system atributes which are referred to as "Native SA xattrs". The facilities described here are only enabled when "xattr=sa" is set. If xattr=sa is set, the following "security." and "system." xattrs are stored as native SA xattrs rather than as elements of the ZPL_DXATTR SA: xattr System atrribute -------------------------------------------------------- security.selinux ZPL_SECURITY_SELINUX security.capability ZPL_SECURITY_CAPABILITY system.posix_acl_access ZPL_SYSTEM_POSIX_ACL_ACCESS system.posix_acl_default ZPL_SYSTEM_POSIX_ACL_DEFAULT Storing these xattrs as native system attributes allows for the ZPL to more easily and naturally operate on them as an atomic part of other operations and will be used as the foundation for fixing issue openzfs#2718. Zdb will display these under the new "Native SA xattrs" section. Lookups of these xattrs will use the following priority: 1. Native SA xattr (as shown in the able above) 2. Linux ZPL_DXATTR nvlist 3. Traditional ZFS directory-style xattr Modifications of these xattrs will erase an existing ZPL_DXATTR instance of an identically-named xattrs but will not change an existing instance of an identically-named directory-style xattr. If a modification of the ZPL_DXATTR SA causes it to become empty, the ZPL_DXATTR SA, itself is deleted to maximize available space in the bonus buffer. The effect is that existing ZPL_DXATTR SAs are automatically upgraded as they are changed.
This patch adds a new class of system atributes which are referred to as "Native SA xattrs". The facilities described here are only enabled when "xattr=sa" is set. If xattr=sa is set, the following "security." and "system." xattrs are stored as native SA xattrs rather than as elements of the ZPL_DXATTR SA: xattr System atrribute -------------------------------------------------------- security.selinux ZPL_SECURITY_SELINUX security.capability ZPL_SECURITY_CAPABILITY system.posix_acl_access ZPL_SYSTEM_POSIX_ACL_ACCESS system.posix_acl_default ZPL_SYSTEM_POSIX_ACL_DEFAULT Storing these xattrs as native system attributes allows for the ZPL to more easily and naturally operate on them as an atomic part of other operations and will be used as the foundation for fixing issue openzfs#2718. Zdb will display these under the new "Native SA xattrs" section. Lookups of these xattrs will use the following priority: 1. Native SA xattr (as shown in the able above) 2. Linux ZPL_DXATTR nvlist 3. Traditional ZFS directory-style xattr Modifications of these xattrs will erase an existing ZPL_DXATTR instance of an identically-named xattrs but will not change an existing instance of an identically-named directory-style xattr. If a modification of the ZPL_DXATTR SA causes it to become empty, the ZPL_DXATTR SA, itself is deleted to maximize available space in the bonus buffer. The effect is that existing ZPL_DXATTR SAs are automatically upgraded as they are changed.
Refreshed as dweeezil/zfs@533cf28. The I have not yet added a/any feature flag(s) to accommodate this facility or the existing |
As part of my work on this this in #2809, I realized the title of this issue could really use the word "managed" rather than "created". The same problem also exists in the |
[dweeezil] There are 2 commits here. The first, dweeezil/zfs@c53847e, fixes deletion of system attributes. The second is a intended to be the foundation for fixing openzfs#2718 but it should otherwise be able to stand on its own. I've given this patch stack fairly light manual testing as well as running the "pjd" posixacl test suite and some short ztest runs. I'm submitting this now as a functional WIP to get feedback from other developers regarding the overall concept and its implementation. There are a handful of issues I'll point out about the code: The list of xattrs to store as native SAs is stored statically in zpl_xattr.c which made it difficult to share with zdb. The same list (in a different format) exists within zdb.c as part of the new dump_znode_native_sa_xattr() function. The management of existing ZPL_DXATTR SAs required a helper function to translate from a zpl_attr_t to its name. As far as I could tell, there were no existing "upcalls" from the ZoL ZPL to the native Solaris ZPL. Rather than trying to #include the ZPL definitions in zfs_sa.c, I simply added an appropriate extern declaration. This goes against the current coding style. The ZPL_DXATTR SA cleanup performed by zfs_sa_remove_xattr adds overhead which could be avoided if it was known that no ZPL_DXATTRs exist. To that end, it's only invoked after determining whether a ZPL_DXATTR exists (which should be a low-overhead operation). I'd like to add a feature flag which is activated when a Native SA xattr is set for the first time; the feature flag would not be able to be deactivated. We might also want a feature flag to do the same with the existing ZPL_DXATTR SAs. I'd also like feedback on the first commit in this stack. I discovered the problem during testing. My concern is that all other OpenZFS implementations likely share the identical code. I think the problem isn't seen often because SAs are rarely deleted. My test system created sort of a "perfect storm" by "stacking" all the variably-sized SAs at the ends of the registration.
This patch adds a new class of system atributes which are referred to as "Native SA xattrs". The facilities described here are only enabled when "xattr=sa" is set. If xattr=sa is set, the following "security." and "system." xattrs are stored as native SA xattrs rather than as elements of the ZPL_DXATTR SA: xattr System atrribute -------------------------------------------------------- security.selinux ZPL_SECURITY_SELINUX security.capability ZPL_SECURITY_CAPABILITY system.posix_acl_access ZPL_SYSTEM_POSIX_ACL_ACCESS system.posix_acl_default ZPL_SYSTEM_POSIX_ACL_DEFAULT Storing these xattrs as native system attributes allows for the ZPL to more easily and naturally operate on them as an atomic part of other operations and will be used as the foundation for fixing issue openzfs#2718. Zdb will display these under the new "Native SA xattrs" section. Lookups of these xattrs will use the following priority: 1. Native SA xattr (as shown in the able above) 2. Linux ZPL_DXATTR nvlist 3. Traditional ZFS directory-style xattr Modifications of these xattrs will erase an existing ZPL_DXATTR instance of an identically-named xattrs but will not change an existing instance of an identically-named directory-style xattr. If a modification of the ZPL_DXATTR SA causes it to become empty, the ZPL_DXATTR SA, itself is deleted to maximize available space in the bonus buffer. The effect is that existing ZPL_DXATTR SAs are automatically upgraded as they are changed.
@kpande The closest thing we have to a fix for this problem is 214806c which should avoid the problem, however, I'm not sure it's a complete fix. I think something like 533cf28 is a better long-term solution, but it's a major on-disk format change and the only way something like it would likely ever be adopted would be to use a feature flag to indicate its use and then to have a background task which would "upgrade" existing "xattr=sa" style xattrs to the new format (similar to enabling userobj_accounting). This is an area that will likely become of more interest if the ZoL code base actually becomes canonical (insofar as OpenZFS is concerned). |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
As alluded to in #2445, neither the "security" xattr nor the xattr related to a posix ACL are managed atomically in the ZPL with respect to the object to which they are attached. This can lead to various weird situations such as files or directories being created without their related xattr.
A related, and as far as I'm aware currently unreported problem is that these xattrs are not managed at all during ZIL replay (mkdir, for example). [EDIT: see next comment]
Both these issues are specific to zfsonlinux; other OpenZFS implementations' host operating systems don't have an analogous feature.
I have a WIP solution which is to push these operations down into the ZFS/DMU layer so they can happen as part of the same transaction as the related ZPL operation. The complication, unfortunately, is that the the ZFS layer has no business touching ZPL-related structures nor does it even have access to same. My proposed solution is to add callbacks to some of the "zfs_..." functions to allow for in-transaction post-processing. Of course, this causes us to diverge from the upstream code.
I'm posting this issue now in light of the possibly tangentially-related issues #2717, #2663, #2700 and others which I can't remember at this time. Those issues all are tantalizingly related to the management of a dnode's associated xattrs. There's a good chance those issues aren't related at all to the problem I've outlined above but I wanted to mention them in any case.
The text was updated successfully, but these errors were encountered: