-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAX: size overflow detected in function zil_itx_create #2505
Comments
@mrobbetts PAX may be reporting a legitimate issue here. However, the code it's complaining about looks reasonable to me and I don't immediately see an overflow. Perhaps @ryao can better interpret the PAX warning. |
This looks like a reasonable explanation of the PAX stuff: https://forums.grsecurity.net/viewtopic.php?f=7&t=3043 I also don't see how it could overflow, everything is a constant (
Perhaps the P2ROUNDUP_TYPED macro is confusing the overflow detector due to sign extensions or something?
|
@behlendorf I would need to reproduce it locally. I will not have time to do that until the weekend at the earliest. |
Looks like a PAX and/or gcc (4.7.2) optimisation bug. A test program exhibits the PAX error when compiled w/ -O, and has no error without -O:
The test program, based on the examples in https://forums.grsecurity.net/viewtopic.php?f=7&t=3043 :
Briefly, to build the
|
FYI, reported to the PAX/grsecurity development forum: User, code, gcc or grsecurity error? ...no response as yet. |
The response from "PaX Team":
I.e. it's a limitation (or maybe a deliberate design choice) of the PAX overflow detector rather than a problem in the ZoL code (unless you consider poor code clarity a bug). |
@chrisrd Thanks for looking into this for me. Would you recommend simply disabling PaX for use with ZoL, then? I've done that in the mean time, and everything appears to be working correctly. |
@mrobbetts I guess it depends on how strongly you feel about using a PaX hardened kernel. Disabling PaX would be easiest. Selectively turning off PaX for the ZoL compile, if possible, could be an option. @ryao likely has a better idea than myself. Overall, I'm not sure what the best long term solution would be. As pointed out by "PaX Team" in the grsecurity link above, there's an argument the P2ROUNDUP_TYPED and related macros should be modified to not do "tricky stuff" (or at least not do tricky stuff that trips up grsecurity). On the other hand, the "tricky stuff" is using standards-defined C behaviour so there's an argument this is a bug in grsecurity's overflow detector and should be fixed there. On the third hand, "Pax Team" has provided an alternative implementation of P2ROUNDUP_TYPED (similar work would probably need to be done for the related macros) which likely won't affect performance to any measurable degree so the issue can be addressed in ZoL. In the end it's going to boil down to who cares the most: personally I don't use a PaX hardened kernel so it's not an issue for me. Someone on the PaX side using ZoL might be encouraged to look into fixing it in PaX, or someone on the ZoL side who's interested in using PaX might fix it here (@ryao, hint, hint :-) ). |
What is the best way to solve this in the meantime? Fallback to 0.6.2? |
A build with 0.6.2 and Linux 3.16.5 and PaX/Grsec fails with
|
Information for anyone stumbling over this. 0.6.2 does not work on 3.16.5 or later. For that you have to use 0.6.3 or later. |
As a sysadmin, I'm of the opinion that we should be able to use PaX with ZFS. I administer machines using ZFS with machines using PaX, and I think it's unfortunate that we'd have to choose between the added reliability of ZFS and the added security of PaX. How can I help? |
@gcs-github If you have C experience, I'd recommend looking at implementing the alternative P2ROUNDUP_TYPED and associated macros as recommended by PaX Team upthread. Or, possibly even better, see if the linux source has any "round up" macros/functions that can be used instead - I'd be pretty surprised if it doesn't! Of some related interest, the issue of detecting integer overflows is being discussed on LKML: |
@chrisrd Sure, I'll look into it! :) That'll be the first time for me getting inside the code of ZFS (or even kernel-related software in general). Do you have any pointers to set up a proper dev environment and the proper test suite for it? |
@gcs-github Sorry, not really. Prior to actually testing inside the kernel (which would be better done in a virtual machine) you should use |
Just as a pointer: there is the |
I get same issue zfs is 0.6.3 with patches (gentoo version by ryao) uname -aLinux mdstor 3.17.6-hardened #2 SMP Thu Dec 11 14:15:29 MSK 2014 x86_64 Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz GenuineIntel GNU/Linux LANG=C gcc -vUsing built-in specs. [ 59.975090] PAX: size overflow detected in function zil_itx_create /var/tmp/portage/sys-fs/zfs-kmod-0.6.3-r1/work/zfs-zfs-0.6.3/module/zfs/../../module/zfs/zil.c:1187 cicus.150_20 min, count: 2 |
In case someone would like to test, I have an (untested!) potential fix in https://github.com/Shizmob/spl/commit/32f4ceca6f977121b639fb2f89111792177746a3. Any input would be appreciated. |
HI @shizmob , your |
I believe it's safe because, looking at the Linux implementation of |
However your |
I checked all usages of As far as unusual compilers go, I'm not quite sure what you mean seeing as Linux only compiles with Also of note is that the change I made to |
I went through the source and found all the places that use the P2ROUNDUP macro to see what kind of issues it might run into. Original macros:
All usages are completely unsigned, so the -x doesnt entirely make sense
Lots of places use P2ROUNDUP() but a lot of these are using other #define Proposed new macro: Linux has the following which takes care of the typing already:
Is there ever a time when we want to rely on the type of the mask? if not then
I am going to look through the other macros too before I prepare a proper pull request since a couple of the others look like they also have similar issues. Thoughts so far? Also, this isnt a build issue, the gcc plugin is used at buildtime but these issues trigger during runtime when built with a PaX kernel |
@perfinion keep in mind my upthread comment about my standing in the ZoL community, but to my mind changing these macros to make them PaX compatible requires a very good understanding of the sign extension and type promotion rules according to the C standards. I don't have that understanding myself, but your comment "All usages are completely unsigned, so the -x doesnt entirely make sense" makes me think you should be looking into this much further. Likewise, regarding "Is there ever a time when we want to rely on the type of the mask?", my preference would be to be referencing C standards about why the type of the mask doesn't matter. |
@chrisrd so for P2ROUNDUP_TYPED, there are no uses of it that are signed, so sign extension does not matter. The others it might matter i'll double check. And by "All usages are completely unsigned, so the -x doesnt entirely make sense" I meant that taking the negative of an unsigned number doesnt entirely make logical sense. In C it ends up the equivalent of (~x+1) but if thats what is wanted, then better to just do that since it "makes sense" for both signed and unsigned. And about "Is there ever a time when we want to rely on the type of the mask?" I did not mean in regard to the macros. I meant are there places in the code that use it relying on the type of the mask? Because from what I could see, there were several places that used the untyped macro but I think they should instead be using the typed one. An integer constant without any qualifiers is signed in C, and is used in many places where the other type is unsigned which seems like an error. casting to the type of x seems more correct. If it specifically needs the type of the mask then it is much clearer to use the _TYPED version The issue is that currently there are many instances eg P2ROUNDUP(size, 8), that is technically P2ROUNDUP(size, (signed int)8), but having a negative size is mostly likely a bug. |
@perfinion It seems my usage of "sign extension" was an error, it should have been something like "negation of unsigned quantities" - thanks for pointing that out. My point was that the negation of an unsigned quantity is a well specified operation in C and I was concerned you hadn't reached that level of understanding. I now understand your comment was more that, in the real world, negating an unsigned quantity doesn't make a lot of sense - which I agree with. (On the other hand, we're not the real world here, we're in "C" world: here be monsters aplenty!) Overall, my concern is that any alternate form we come up with for these macros should be guaranteed, by the C specification, to behave identically to the original macros (apart from not triggering the PaX overflow detection of course!), otherwise we risk having subtly different behaviour to the other ZFS implementations which could be exceedingly difficult to track down. I'm more than willing to accept that the current macros, and/or their usages, may actually be "wrong" in some way (e.g. your example of |
okay @chrisrd, I finally figured out how to prove they are equivalent. dump this in test.c, there are no includes or anything. // Set the type everywhere, try signed/unsigned and int/long/long long
#define thetype unsigned int
thetype old(thetype x, thetype align)
{
// The original SPL macro
return -(-(x) & -(align));
}
thetype negate(thetype x, thetype align)
{
// -x == twos complement negation which is equivalent to ~x+1
return ~(-(x) & -(align)) + 1;
}
thetype demorgan(thetype x, thetype align)
{
// Demorgans law: ~(A & B) ==> ~A | ~B
return ((~(-x)) | (~(-align))) + 1;
}
thetype new(thetype x, thetype align)
{
// The new macro that does not trigger the overflow
return ((((x) - 1) | ((align) - 1)) + 1);
}
int main(void)
{
old((thetype)1234, (thetype)8);
negate((thetype)1234, (thetype)8);
demorgan((thetype)1234, (thetype)8);
new((thetype)1234, (thetype)8);
return 0;
} now compile and dump the assembly. the stack options are because my hardened compiler defaults to enabling them and the assembly is more annoying to read with them enabled. Also play around with -O[0-3] and -Os. They all work the same but give slightly different outputs.
So from old->negate, the twos complement is absolutely no change to the generated asm. negate->demorgan is just demorgans law which is fine too. Then from demorgan->new there is no difference in the generated asm so they are also the same. I think this should suffice that the new and old are equivalent. do try it on other compilers and see if we get the same. I tried clang too and it is exactly the same. |
@perfinion Looks good to me! I still have one question though: with the new implementation, for |
both the new and old are the same for x = 0; old: 0 & anything == 0; and new: 0 -1 | anything + 1= -1 + 1 so they both end up just returning zero no matter what the align arg is. This will still trigger a pax size overflow for the new one but I think that is good because zero is almost definitely wrong. |
@chrisrd The two are equivalent for all inputs. I sketched out a mathematical proof of equivalence after hearing @perfinion's explanation in IRC earlier today:
|
@ryao Yes, I saw your #3949 (comment) and appreciate the formalisation of @perfinion's logic at #2505 (comment). However the mathematical proof doesn't take into account C's type promotion rules: i.e. for the different sequence of operations (at the C compiler level), do the C type promotion rules guarantee the new macro ends up with the same result as the original? On the other hand I've basically come around to the view that, whilst insisting on guarantees of equivalence per the C spec might be the technically correct thing to do, we're just as likely (which is not very likely at all) to see differences in behaviour in the original macro between platforms due to differences between compilers and/or optimisation settings etc. I.e. it's not worth worrying too much about the exquisite details of the type promotion rules as long as we're comfortably sure the behaviour of the two macros is the same for all cases we care about. I also have a slight concern regarding triggering the PaX overflow for the
...but, like @perfinion, I'm reasonably sure the I wonder what the chances are of getting the updated macro into OpenZFS? |
@chrisrd Good point about the type coercion rules not being included in the proof. One way to formally decide whether the type coercion rules pose a problem would seem to be to look at each of the axioms in the proof and answer the question "does the axiom break under C's type coercion rules?". If the answer is no for all of them, then the proof holds for type coercion. If the answer is yes for any of them, then the proof breaks. The only axiom where I suspect anything might be different is the one from two's complement where the answer depends on how the compiler does coercion with the constant. If the constant is automatically coerced to the type of the variable, we are fine because whatever coercion that follows would be the same as it would have been between It should be said that I am assuming that type coercion is the same regardless of how Those caveats aside, I believe that the correct answer on equivalence when type coercion is taken into consideration should be how coercion of the width of the constant is handled in a compiler. I am inclined to think that a compiler would adopt the width of the first coercion that occurs since the 1 can safely fit in any integral type because that is how I would see myself handling this if I wrote a compiler front end (I actually have, but it was not a C front end and it just a college project). This needs some investigation, but even if the C specification provides favorable behavior, it is still possible for there to be compiler bugs in implementing it. As for getting the updated macro into OpenZFS, I think the chances are good following a discussion about upstreaming this that I had yesterday in #illumos on the freenode IRC network. |
Given that the only risk of non-equivalence should be a compiler warning, we could just assume it is equivalent unless the compiler warns otherwise. I could try formalizing my informal reasoning on C type coercion. It would be an interesting exercise for learning how to apply things that I learned in school, but blocking the merge of the patch on complete formal verification is overkill, so it should be fine to merge without that. @behlendorf Do you see anything else that we need to check? |
@ryao In sort, I agree this is fine to go in. However I'm not sure what you're thinking about there: the "non-equivalence" of what objects, and what would the compiler would be warning about? I don't think looking at the type promotion (whether coerced or automatic) involved in each step of the mathematical proof is going to be much use to us: the C compiler won't be following the proof to transform from the first macro to the second as the proof does, the compiler will be following whatever wild and woolly path the compiler writer came up, almost certainly whilst 3 days sleepless and keeping going by sheer willpower and ingestion of copious mind altering substances (the end results are generally indistinguishable from those produced by a wizard of optimisation), to get to the end point of rounding up the value. The best we can hope for, for either macro, is the end result is in accordance with how a sane mind would interpret the rules of C. That's basically why I'm now more relaxed about demanding guarantees of equivalence between the macros (i.e. macro to end result, not macro to macro): it would be nice if someone were able to rigorously show what should happen per the C type promotion rules for all possible types (we already have the mathematical side covered), but that would still not guarantee the compiler will actually follow the rules itself. (Sure, if we found the compiler wasn't following the rules we could try get it fixed, but that doesn't help in the short to medium term.) I think the best we can do is ensure the new macro is sane and actually works as desired in the cases we care about now. The esoteric and currently non-existing cases I was previously expressing concern about (odd new types and/or strange new compilers being used) can be dealt with if and when they come about. |
@chrisrd Is your concern the situation where GCC simultaneously miscompiles the new code and fails to issue a warning? I am not concerned about that possibility with the new macro anymore than I am concerned about that possibility with the old macro. While miscompilation is always a matter of concern, that deserves its own issue and a separate resolution in the form of a switch to a formally verified compiler such as the CompCert C compiler: That said, I consider the biggest risk to be that GCC will evaluate the expression formed by the macro to have a different integral type than the original in its Abstract Syntax Tree. If that occurs, the compiler will overzealously increase the bit width of the variable rather than coercing the bit width of the constant and/or will treat the constant as a signed type. Provided that the GCC follows the rules of C, the latter is not a problem while the former would only cause a problem should the width increase too much upon assignment. That should cause GCC to generate a compiler warning about an integral type being truncated upon assignment to an lval. This should occur in "semantic analysis" as shown in this diagram: http://jcsites.juniata.edu/faculty/rhodes/lt/images/ccover4.gif |
@ryao (apologies for the downtime...) my concern was that the new macro should produce the same end results as the old macro in all possible cases, according to the rules of C (including type promotion (TP) and expression evaluation ordering (EEO) etc.). What I was looking for was a proof along the lines of "ok, the original macro works like this, according to the rules of C (including TP and EEO), and the new macro works like this other, again according to the rules of C, and hence we see that the two produce the same results for all possible cases (assuming the compiler is strictly conforming)". I'm somewhat more relaxed about wanting that rigorous approach now that I'm more familiar with the C type promotion rules (thanks to this issue) and the ways the macro is used in current code. I.e. I'm comfortable the new macro will work for current usage and reasonable future usage, and I think any future breakage (i.e. differences in output of the old and new macros) due to my hypothetical "odd new types and/or strange new compilers being used" are just as (un!)likely as any future breakage due to compiler bugs etc. |
@chrisrd @perfinion @ryao thank you all for working on this. That patch looks good to me and you all agreed it's correctly I'll get it merged. |
OK, after giving this a careful looking over I'm convinced it's correct too. I've merged it. Thank you everyone! Merged as: openzfs/spl@8fc851b sysmacros: Make P2ROUNDUP not trigger int overflow |
@behlendorf Thanks for pulling this in! you need to pull in the second pull request too. The header file from spl is in the zfs repo too: https://github.com/zfsonlinux/zfs/blob/master/lib/libspl/include/sys/sysmacros.h#L53-L81 This pull request does the exact same fix for that file too: #3949 |
sysmacros: Make P2ROUNDUP not trigger int overflow The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs/zfs#2505 Closes #488
@perfinion thanks for the reminder. I've just pulled it in. |
The patches to fix this have been merged into master now. openzfs/zfs#2505 Package-Manager: portage-2.2.20.1
The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs/zfs#2505 Closes openzfs#488
The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs/zfs#2505 Closes #488
The patches to fix this have been merged into master now. openzfs/zfs#2505 Package-Manager: portage-2.2.20.1
Since updating to 0.6.3 (also kernel 3.14.5-hardened on Gentoo) I am seeing some of these in my dmesg:
This is obviously a 'hardened' (64-bit) kernel with PaX enabled. I saw it most recently when I tried to eix-sync my Gentoo installation (with
/usr/portage/
stored on a zfs file system). That errored out with:Is this a problem with ZoL, or my configuration?
The text was updated successfully, but these errors were encountered: