Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer][Solver] Early return if sym is concrete on assuming #115579

Merged

Conversation

danix800
Copy link
Member

@danix800 danix800 commented Nov 9, 2024

This could deduce some complex syms derived from simple ones whose values could be constrainted to be concrete during execution, thus reducing some overconstrainted states.

This commit also fix unix.StdCLibraryFunctions crash due to these overconstrainted states being added to the graph, which is marked as sink node (PosteriorlyOverconstrained). The 'assume' API is used in non-dual style so the checker should protectively test whether these newly added nodes are actually impossible.

  1. The crash: https://godbolt.org/z/8KKWeKb86
  2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh

This could deduce some complex syms derived from simple ones whose
values could be constrainted to be concrete during execution, thus
reducing some overconstrainted states.

This commit also fix 'unix.StdCLibraryFunctions' crash due to these
overconstrainted states being added to the graph, which is marked as
sink node (PosteriorlyOverconstrained). The 'assume' API is used in
non-dual style so the checker should protectively test whether these
newly added nodes are actually impossible.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Nov 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Ding Fei (danix800)

Changes

This could deduce some complex syms derived from simple ones whose values could be constrainted to be concrete during execution, thus reducing some overconstrainted states.

This commit also fix unix.StdCLibraryFunctions crash due to these overconstrainted states being added to the graph, which is marked as sink node (PosteriorlyOverconstrained). The 'assume' API is used in non-dual style so the checker should protectively test whether these newly added nodes are actually impossible.

  1. The crash: https://godbolt.org/z/8KKWeKb86
  2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh

Full diff: https://github.com/llvm/llvm-project/pull/115579.diff

6 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp (+2)
  • (modified) clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp (+8-1)
  • (added) clang/test/Analysis/solver-sym-simplification-on-assumption.c (+31)
  • (added) clang/test/Analysis/std-c-library-functions-bufsize-nocrash-with-correct-solver.c (+33)
  • (modified) clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp (+3-3)
diff --git a/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
index 4f30b2a0e7e7da..5faaf9cf274531 100644
--- a/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
@@ -1354,6 +1354,8 @@ void StdLibraryFunctionsChecker::checkPreCall(const CallEvent &Call,
             if (BR.isInteresting(ArgSVal))
               OS << Msg;
           }));
+      if (NewNode->isSink())
+        break;
     }
   }
 }
diff --git a/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
index c0b3f346b654df..2b77167fab86f2 100644
--- a/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
@@ -74,7 +74,7 @@ ConstraintManager::assumeDualImpl(ProgramStateRef &State,
       // it might happen that a Checker uncoditionally uses one of them if the
       // other is a nullptr. This may also happen with the non-dual and
       // adjacent `assume(true)` and `assume(false)` calls. By implementing
-      // assume in therms of assumeDual, we can keep our API contract there as
+      // assume in terms of assumeDual, we can keep our API contract there as
       // well.
       return ProgramStatePair(StInfeasible, StInfeasible);
     }
diff --git a/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp
index 4bbe933be2129e..cc2280faa6f730 100644
--- a/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp
@@ -23,7 +23,14 @@ RangedConstraintManager::~RangedConstraintManager() {}
 ProgramStateRef RangedConstraintManager::assumeSym(ProgramStateRef State,
                                                    SymbolRef Sym,
                                                    bool Assumption) {
-  Sym = simplify(State, Sym);
+  SVal SimplifiedVal = simplifyToSVal(State, Sym);
+  if (SimplifiedVal.isConstant()) {
+    bool Feasible = SimplifiedVal.isZeroConstant() ? !Assumption : Assumption;
+    return Feasible ? State : nullptr;
+  }
+
+  if (SymbolRef SimplifiedSym = SimplifiedVal.getAsSymbol())
+    Sym = SimplifiedSym;
 
   // Handle SymbolData.
   if (isa<SymbolData>(Sym))
diff --git a/clang/test/Analysis/solver-sym-simplification-on-assumption.c b/clang/test/Analysis/solver-sym-simplification-on-assumption.c
new file mode 100644
index 00000000000000..941c584c598c52
--- /dev/null
+++ b/clang/test/Analysis/solver-sym-simplification-on-assumption.c
@@ -0,0 +1,31 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   -verify
+
+void clang_analyzer_eval(int);
+
+void test_derived_sym_simplification_on_assume(int s0, int s1) {
+  int elem = s0 + s1 + 1;
+  if (elem-- == 0) // elem = s0 + s1
+    return;
+
+  if (elem-- == 0) // elem = s0 + s1 - 1
+    return;
+
+  if (s0 < 1) // s0: [1, 2147483647]
+    return;
+  if (s1 < 1) // s0: [1, 2147483647]
+    return;
+
+  if (elem-- == 0) // elem = s0 + s1 - 2
+    return;
+
+  if (s0 > 1) // s0: [-2147483648, 0] U [1, 2147483647] => s0 = 0
+    return;
+
+  if (s1 > 1) // s1: [-2147483648, 0] U [1, 2147483647] => s1 = 0
+    return;
+
+  // elem = s0 + s1 - 2 should be 0
+  clang_analyzer_eval(elem); // expected-warning{{FALSE}}
+}
diff --git a/clang/test/Analysis/std-c-library-functions-bufsize-nocrash-with-correct-solver.c b/clang/test/Analysis/std-c-library-functions-bufsize-nocrash-with-correct-solver.c
new file mode 100644
index 00000000000000..3b39bbe32dfc21
--- /dev/null
+++ b/clang/test/Analysis/std-c-library-functions-bufsize-nocrash-with-correct-solver.c
@@ -0,0 +1,33 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=unix.StdCLibraryFunctions \
+// RUN:   -analyzer-config unix.StdCLibraryFunctions:ModelPOSIX=true \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   -triple x86_64-unknown-linux-gnu \
+// RUN:   -verify
+
+// expected-no-diagnostics
+
+#include "Inputs/std-c-library-functions-POSIX.h"
+
+void _add_one_to_index_C(int *indices, int *shape) {
+  int k = 1;
+  for (; k >= 0; k--)
+    if (indices[k] < shape[k])
+      indices[k]++;
+    else
+      indices[k] = 0;
+}
+
+void PyObject_CopyData_sptr(char *i, char *j, int *indices, int itemsize,
+    int *shape, struct sockaddr *restrict sa) {
+  int elements = 1;
+  for (int k = 0; k < 2; k++)
+    elements += shape[k];
+
+  // no contradiction after 3 iterations when 'elements' could be
+  // simplified to 0
+  while (elements--) {
+    _add_one_to_index_C(indices, shape);
+    getnameinfo(sa, 10, i, itemsize, i, itemsize, 0);
+  }
+}
diff --git a/clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp b/clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
index 679ed3fda7a7a7..3f34d9982e7c8a 100644
--- a/clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
+++ b/clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
@@ -28,13 +28,13 @@ void test(int a, int b, int c, int d) {
     return;
   clang_analyzer_printState();
   // CHECK:       "constraints": [
-  // CHECK-NEXT:    { "symbol": "(reg_$0<int a>) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
+  // CHECK-NEXT:    { "symbol": "((reg_$0<int a>) + (reg_$2<int c>)) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
   // CHECK-NEXT:    { "symbol": "reg_$1<int b>", "range": "{ [0, 0] }" },
   // CHECK-NEXT:    { "symbol": "reg_$2<int c>", "range": "{ [0, 0] }" }
   // CHECK-NEXT:  ],
   // CHECK-NEXT:  "equivalence_classes": [
-  // CHECK-NEXT:    [ "(reg_$0<int a>) != (reg_$3<int d>)" ],
-  // CHECK-NEXT:    [ "reg_$0<int a>", "reg_$3<int d>" ],
+  // CHECK-NEXT:    [ "((reg_$0<int a>) + (reg_$2<int c>)) != (reg_$3<int d>)" ],
+  // CHECK-NEXT:    [ "(reg_$0<int a>) + (reg_$2<int c>)", "reg_$3<int d>" ],
   // CHECK-NEXT:    [ "reg_$2<int c>" ]
   // CHECK-NEXT:  ],
   // CHECK-NEXT:  "disequality_info": null,

@danix800 danix800 assigned martong and unassigned martong Nov 9, 2024
@danix800 danix800 requested review from martong and balazske November 9, 2024 04:41
@steakhal
Copy link
Contributor

steakhal commented Nov 9, 2024

Hi, thanks for the PR!

I'm slightly confused that the compiler crash you refer to comes from the stdlibrary fn checker.
This suggest to me a checker problem - and likely relates to the stdlibraryfn checker early return.

However, this also couples with a solver change. Is this improving the solver that would also make the checker avoid the crash? If this is the case, then we should have separate PRs because we should harden the checker on one side, and also improve the solver in an other area.

In any case, I'll have a look at this next week.

@steakhal steakhal self-requested a review November 9, 2024 13:20
@danix800
Copy link
Member Author

danix800 commented Nov 10, 2024

Yes these two are related.

Solver is the root cause, the checker crash is the effect. Either fix could cover the crash
but not enough without the other.

The solver is inherently incapable of solving some constraints, which would brings in
some overly constrainted states. We could try to improve the solver's precision as much
as possible. In this case these impossible states would crash the checker.

The checker should protectively test against impossible states since they're inevitable
in anyway.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the commit, I'm satisfied with it :)

I actually like that these two related changes (the checker change and the constraint manager improvement) are handled together in a single commit -- this way somebody who browses the commit log can directly see the "other half" of the change (without following cumbersome links through "this commit is mentioned in the commit message of that one" or opening this github review). Also, the checker change is so trivial (+2 lines) that the full combined commit is still small enough to be easily understood.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the commit, I'm satisfied with it :)

I actually like that these two related changes (the checker change and the constraint manager improvement) are handled together in a single commit -- this way somebody who browses the commit log can directly see the "other half" of the change (without following cumbersome links through "this commit is mentioned in the commit message of that one" or opening this github review). Also, the checker change is so trivial (+2 lines) that the full combined commit is still small enough to be easily understood.

I agree.

The patch looks good to me. I made some recommendations for improving the tests.

I have one question. In the StdLibraryFn checker you added a test for isSink().
I thought that the crash was that some assume(..., true|false) returned a null State, that we dereferenced. Where was that assume call and how did we unconditionally dereference it?

@danix800
Copy link
Member Author

Thanks for the commit, I'm satisfied with it :)
I actually like that these two related changes (the checker change and the constraint manager improvement) are handled together in a single commit -- this way somebody who browses the commit log can directly see the "other half" of the change (without following cumbersome links through "this commit is mentioned in the commit message of that one" or opening this github review). Also, the checker change is so trivial (+2 lines) that the full combined commit is still small enough to be easily understood.

I agree.

The patch looks good to me. I made some recommendations for improving the tests.

I have one question. In the StdLibraryFn checker you added a test for isSink(). I thought that the crash was that some assume(..., true|false) returned a null State, that we dereferenced. Where was that assume call and how did we unconditionally dereference it?

It's not a null state being dereferenced, but an assertion failure crash. The checker operates on some impossible states
and try to add one of the states back to the graph, which is rejected (assertion failed).

@steakhal
Copy link
Contributor

First, let me thank you for posting a high quality patch like this.

ProgramState::isPosteriorlyOverconstrained() was introduced to carry the fact that even the parent state was infeasible.
Checkers were written in the sense that ProgramState::assume() will return at least one valid State. This was true for a really long time, until we improved the Solver with some simplification triggered by perfectly constraining symbols.
This meant that at that point we could have situations where the Solver disproves both the true and the false states, thus force us to violate the contract of ProgramState::assume by returning two null states. This caused crashes.
To illustrate the issue consider this:

// Let's say x could be either 0 or 1 here.
if (/*some fancy infeasible constraint dependent on x */) {
  if (x == 0) { } else { }
}

Let's pretend that at the first if we barely couldn't prove that there is no value x that would satisfy the constraint.
At the if (x == 0) we would split the path 2 ways, where x is 0 and on the other one is 1. The Solver would try to simplify the constraints and realize that the State where x is 1 has contradictions; and then later also realize that the other State is also infeasible. It's only possible if even the parent State was infeasible.

So, we have to return a "valid" state just to discard everything the checker would do. This wasted work is the lesser evil.
Basically, this is the story of isPosteriorlyOverconstrained(), and the why behind it.

So, the fact that assume() never returns two null states is a blissful lie to ease checker writing.

Now, why is this issue interesting? It's because the checkers can still observe the side effects of this machinery I shared when they chain ExplodedNodes returned by addTransition(). This breaks the illusion, leading to crashes as you shared.

Let's say we have a checker that adds some assumptions, and the assume call returned a single valid state. It's would be surprising that if I add a transition to it, suddenly it sinks. Even though that's technically the right thing to do, the checker did nothing that would imply sinking the path. This doesn't mean that the defensive isSink call in the StdLibFnChecker is unjustified. I'm okay with that. However, this would imply that the rest of the API uses of addTransition where we chain the ExplodedNodes are also left vulnerable to this issue. There are probably a lot less checkers doing this sort of chaining, so checking each callsite may be a valid option. However, this would raise a bar for using this API for newcomers, which I don't really like.

Consequently, I think this patch would mask the issue. We should rather prevent breaking the illusion by somehow allowing subsequent addTransitions even if isPosteriorlyOverconstrained() is true.

@steakhal steakhal changed the title [StaticAnalyzer] early return if sym is concrete on assuming [analyzer][Solver] Early return if sym is concrete on assuming Nov 12, 2024
@danix800
Copy link
Member Author

First, let me thank you for posting a high quality patch like this.

ProgramState::isPosteriorlyOverconstrained() was introduced to carry the fact that even the parent state was infeasible. Checkers were written in the sense that ProgramState::assume() will return at least one valid State. This was true for a really long time, until we improved the Solver with some simplification triggered by perfectly constraining symbols. This meant that at that point we could have situations where the Solver disproves both the true and the false states, thus force us to violate the contract of ProgramState::assume by returning two null states. This caused crashes. To illustrate the issue consider this:

// Let's say x could be either 0 or 1 here.
if (/*some fancy infeasible constraint dependent on x */) {
  if (x == 0) { } else { }
}

Let's pretend that at the first if we barely couldn't prove that there is no value x that would satisfy the constraint. At the if (x == 0) we would split the path 2 ways, where x is 0 and on the other one is 1. The Solver would try to simplify the constraints and realize that the State where x is 1 has contradictions; and then later also realize that the other State is also infeasible. It's only possible if even the parent State was infeasible.

So, we have to return a "valid" state just to discard everything the checker would do. This wasted work is the lesser evil. Basically, this is the story of isPosteriorlyOverconstrained(), and the why behind it.

Thanks for these helpful details!

So, the fact that assume() never returns two null states is a blissful lie to ease checker writing.

Now, why is this issue interesting? It's because the checkers can still observe the side effects of this machinery I shared when they chain ExplodedNodes returned by addTransition(). This breaks the illusion, leading to crashes as you shared.

Let's say we have a checker that adds some assumptions, and the assume call returned a single valid state. It's would be surprising that if I add a transition to it, suddenly it sinks. Even though that's technically the right thing to do, the checker did nothing that would imply sinking the path. This doesn't mean that the defensive isSink call in the StdLibFnChecker is unjustified. I'm okay with that. However, this would imply that the rest of the API uses of addTransition where we chain the ExplodedNodes are also left vulnerable to this issue. There are probably a lot less checkers doing this sort of chaining, so checking each callsite may be a valid option. However, this would raise a bar for using this API for newcomers, which I don't really like.

I think there's no clear boundaries between checkers and the engine. Checkers can decide whether the engine should
stop, a state should be splitted or transitioned to new one, or do nothing at all.

So contracts are important, I think we should clarify these contracts and make more if needed, but not compromise to
some not-well-written checkers. This also means that checker should be aware of the existence of posteriorly-overconstrained states, especially when they try to chain those states.

Consequently, I think this patch would mask the issue. We should rather prevent breaking the illusion by somehow allowing subsequent addTransitions even if isPosteriorlyOverconstrained() is true.

How about: whenever a checker tries to chain states by itself, it should be more careful on node validation. Engine
responds with a failed assertion in this case, which drives us to spot the issue and fix it. If isPosteriorlyOverconstrained()
nodes are allowed then we might lose this chance to fix issues of both the checker and the solver.

isSink() testing is suffice for this case, but might not be clear enough generally. I was suprised by this precedingly sink node too.

As an anology, when checker try to transition to an unchanged state, engine will return the predecessor
as a prevention. If checker needs to chain states, it's a waste for the checker to keep working on this state without
testing that the returned node is the predecessor (nothing is changed).

We might change the engine to return null node intentionaly if it's marked as sink by isPosteriorlyOverconstrained().
It's more clear. But the checker still needs to test it. Whenever we generates error node for bug reporting, we test if the
new node is actually generated:

      if (ExplodedNode *N = C.generateErrorNode(NewState, NewNode))
        reportBug(Call, N, Constraint.get(), Summary, C);

This is another similar contract between checkers and the engine. In this sense, how about testing like this in this case:

     if (!NewNode || NewNode->isSink())
        break;

However, this would imply that the rest of the API uses of addTransition where we chain the ExplodedNodes are also left vulnerable to this issue. There are probably a lot less checkers doing this sort of chaining, so checking each callsite may be a valid option.

Maybe case-by-case fixing is enough.

@steakhal
Copy link
Contributor

I see your point, but I'm still not convinced.
Anyways, that's partially beyond this PR. What we have here I can completely agree with. I just have the feeling it solved one particular case, and not a class of bugs - which is fine.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thank you for this high quality patch. This isn't the first time, I remember. Excellent track record.

@danix800 danix800 merged commit 4163136 into llvm:main Nov 15, 2024
6 of 8 checks passed
@danix800 danix800 deleted the fix/clang-analyzer-range-simplify-before-assume branch November 15, 2024 08:54
@danix800
Copy link
Member Author

LGTM now. Thank you for this high quality patch. This isn't the first time, I remember. Excellent track record.

Thanks for your reviews and all of your kindness! @steakhal @NagyDonat

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-win running on sie-win-worker while building clang at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/7928

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Analysis/symbol-simplification-fixpoint-two-iterations.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe -cc1 -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp    -analyzer-checker=core    -analyzer-checker=debug.ExprInspection    2>&1 | z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe' -cc1 -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include' -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp' -analyzer-checker=core -analyzer-checker=debug.ExprInspection
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe' 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp'
# .---command stderr------------
# | �[1mZ:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp:31:17: �[0m�[0;1;31merror: �[0m�[1mCHECK-NEXT: expected string not found in input
�[0m# | �[1m�[0m // CHECK-NEXT: { "symbol": "((reg_$0<int a>) + (reg_$2<int c>)) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
# | �[0;1;32m                ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:26:18: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m# | �[1m�[0m "constraints": [
# | �[0;1;32m                 ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:27:2: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m# | �[1m�[0m { "symbol": "(reg_$0<int a>) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
# | �[0;1;32m ^
�[0m# | �[0;1;32m�[0m
# | Input file: <stdin>
# | Check file: Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Analysis\symbol-simplification-fixpoint-two-iterations.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# | �[1m�[0m�[0;1;30m           1: �[0m�[1m�[0;1;46m"program_state": { �[0m
# | �[0;1;30m           2: �[0m�[1m�[0;1;46m "store": null, �[0m
# | �[0;1;30m           3: �[0m�[1m�[0;1;46m "environment": { "pointer": "0x18a8bcdd400", "items": [ �[0m
# | �[0;1;30m           4: �[0m�[1m�[0;1;46m { "lctx_id": 1, "location_context": "#0 Call", "calling": "test", "location": null, "items": [ �[0m
# | �[0;1;30m           5: �[0m�[1m�[0;1;46m { "stmt_id": 809, "kind": "ImplicitCastExpr", "pretty": "clang_analyzer_printState", "value": "&code{clang_analyzer_printState}" } �[0m
# | �[0;1;30m           6: �[0m�[1m�[0;1;46m ]} �[0m
# | �[0;1;30m           7: �[0m�[1m�[0;1;46m ]}, �[0m
# | �[0;1;30m           8: �[0m�[1m�[0;1;46m �[0m"constraints": [�[0;1;46m �[0m
# | �[0;1;32mcheck:17       ^~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;30m           9: �[0m�[1m�[0;1;46m �[0m{ "symbol": "(((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)) != (reg_$3<int d>)", "range": "{ [0, 0] }" },�[0;1;46m �[0m
# | �[0;1;32mnext:18        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;30m          10: �[0m�[1m�[0;1;46m �[0m{ "symbol": "(reg_$2<int c>) + (reg_$1<int b>)", "range": "{ [0, 0] }" }�[0;1;46m �[0m
# | �[0;1;32mnext:19        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;30m          11: �[0m�[1m�[0;1;46m �[0m],�[0;1;46m �[0m
# | �[0;1;32mnext:20        ^~
�[0m# | �[0;1;32m�[0m�[0;1;30m          12: �[0m�[1m�[0;1;46m �[0m"equivalence_classes": [�[0;1;46m �[0m
# | �[0;1;32mnext:21        ^~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;30m          13: �[0m�[1m�[0;1;46m �[0m[ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)", "reg_$3<int d>" ]�[0;1;46m �[0m
# | �[0;1;32mnext:22        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;30m          14: �[0m�[1m�[0;1;46m �[0m],�[0;1;46m �[0m
# | �[0;1;32mnext:23        ^~
�[0m# | �[0;1;32m�[0m�[0;1;30m          15: �[0m�[1m�[0;1;46m �[0m"disequality_info": null,�[0;1;46m �[0m
# | �[0;1;32mnext:24        ^~~~~~~~~~~~~~~~~~~~~~~~~
...

@steakhal
Copy link
Contributor

@danix800 Could you please have a look at the failed test, such that we could reapply this PR?
I reverted this soon after I realized the broken test is from this PR.

@danix800
Copy link
Member Author

@danix800 Could you please have a look at the failed test, such that we could reapply this PR? I reverted this soon after I realized the broken test is from this PR.

Working on it!

akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
…115579)

This could deduce some complex syms derived from simple ones whose
values could be constrainted to be concrete during execution, thus
reducing some overconstrainted states.

This commit also fix `unix.StdCLibraryFunctions` crash due to these
overconstrainted states being added to the graph, which is marked as
sink node (PosteriorlyOverconstrained). The 'assume' API is used in
non-dual style so the checker should protectively test whether these
newly added nodes are actually impossible.

1. The crash: https://godbolt.org/z/8KKWeKb86
2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
@danix800
Copy link
Member Author

@danix800 Could you please have a look at the failed test, such that we could reapply this PR? I reverted this soon after I realized the broken test is from this PR.

The test randomly fails for unknown reason, on VS2019~2022, after 1c154bd (seems totally irrelevant to this randomness). Not sure if it's a compiler bug.

@steakhal
Copy link
Contributor

@danix800 Could you please have a look at the failed test, such that we could reapply this PR? I reverted this soon after I realized the broken test is from this PR.

The test randomly fails for unknown reason, on VS2019~2022, after 1c154bd (seems totally irrelevant to this randomness). Not sure if it's a compiler bug.

I'm not sure I see why. The logs I linked in the revert only showed a single test failure. The new test we added in this PR. This suggests close correlations. Maybe the content of the "constraints" map in the State is dumped in a non-deterministic order? And we just happened to step on it now.

@danix800
Copy link
Member Author

I'm not sure I see why. The logs I linked in the revert only showed a single test failure. The new test we added in this PR. This suggests close correlations. Maybe the content of the "constraints" map in the State is dumped in a non-deterministic order? And we just happened to step on it now.

Yes it's the testcase touched by this PR that failed (not other random testcase inside the repo).

I mean that clang behaves randomly.

The randomness might not be introduced by this PR but by some earlier commit point.

I'm working on it.

@steakhal
Copy link
Contributor

I'm not sure I see why. The logs I linked in the revert only showed a single test failure. The new test we added in this PR. This suggests close correlations. Maybe the content of the "constraints" map in the State is dumped in a non-deterministic order? And we just happened to step on it now.

Yes it's the testcase touched by this PR that failed (not other random testcase inside the repo).

I mean that clang behaves randomly.

The randomness might not be introduced by this PR but by some earlier commit point.

I'm working on it.

Be aware of that llvm DenseMap and sets are basically open-address hashtables, and the hash function is seeded by the address of some known function (IDK which). This means that these tables depend on where clang is loaded in memory. Consequently, address-space layout randomization could affect the reproducibility of some issues and lead to flaky cases.

The constraints are backed by llvm::Immutable map and set. which is some AVL-tree if I recall. I'm not sure how the "ordering" is implemented there but if they are in terms of comparing pointer addresses then we might have a problem.

When dumping the constraints, I think we just iterate the AVL-tree from begin to end, which will prefer "less-then" elements, but if those depend on where the process is loaded then the whole iteration order is RT dependent.
Could you please check if we sort before dumping the constraints?

@danix800
Copy link
Member Author

danix800 commented Nov 19, 2024

It is unstable ordering of elements from DenseMap/Set.

This PR actually breaks the recursive fixpoint simplification algorithm of the eq class.

One thing left which confused me is that why no randomness is observed when compiled
by gcc on linux, purely out of coincidence?

=====

EDIT: It's ImmutableMap/Set, not DenseMap/Set.

@danix800
Copy link
Member Author

For this testcase, two constrainst are collected on the path:

(1) a + b + c == d
(2) b + c = 0

when assuming the third constraint b == 0, (1) or (2) is selected at random order for simplifying
eq class.

The fixedpoint simplification algorithm will recurse by Re-evaluate an SVal with top-level State->assume logic.
Early return in this PR would break this simplification.

@steakhal
Copy link
Contributor

Thank you for your dedication. What are your plans?
Do you plan to continue pushing this?

Btw why did this test only fail on Windows?

@danix800
Copy link
Member Author

Thank you for your dedication. What are your plans? Do you plan to continue pushing this?

Btw why did this test only fail on Windows?

I'll further investigate if it's possilbe to do similar improvement on the solver in other ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants