-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Fix performance of getTaintedSymbolsImpl() #89606
[analyzer] Fix performance of getTaintedSymbolsImpl() #89606
Conversation
Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind llvm#89045 (note that `sheervideo.c` does very complex pointer arithmetic). This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls.
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: None (NagyDonat) ChangesPreviously the function
(one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all As pointer arithmetic produces This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of Full diff: https://github.com/llvm/llvm-project/pull/89606.diff 1 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 4edb671753bf45..6362c82b009d72 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -216,21 +216,17 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
std::vector<SymbolRef> TaintedSymbols;
if (!Reg)
return TaintedSymbols;
- // Element region (array element) is tainted if either the base or the offset
- // are tainted.
+
+ // Element region (array element) is tainted if the offset is tainted.
if (const ElementRegion *ER = dyn_cast<ElementRegion>(Reg)) {
std::vector<SymbolRef> TaintedIndex =
getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly);
llvm::append_range(TaintedSymbols, TaintedIndex);
if (returnFirstOnly && !TaintedSymbols.empty())
return TaintedSymbols; // return early if needed
- std::vector<SymbolRef> TaintedSuperRegion =
- getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
- llvm::append_range(TaintedSymbols, TaintedSuperRegion);
- if (returnFirstOnly && !TaintedSymbols.empty())
- return TaintedSymbols; // return early if needed
}
+ // Symbolic region is tainted if the corresponding symbol is tainted.
if (const SymbolicRegion *SR = dyn_cast<SymbolicRegion>(Reg)) {
std::vector<SymbolRef> TaintedRegions =
getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly);
@@ -239,6 +235,8 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
return TaintedSymbols; // return early if needed
}
+ // Any subregion (including Element and Symbolic regions) is tainted if its
+ // super-region is tainted.
if (const SubRegion *ER = dyn_cast<SubRegion>(Reg)) {
std::vector<SymbolRef> TaintedSubRegions =
getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
@@ -318,4 +316,4 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
}
}
return TaintedSymbols;
-}
\ No newline at end of file
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested change make a lot of sense. Thanks.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The patch makes sense. Thank you for the promptly fix.
I've checked and this resolves the FFmpeg
issue.
i wonder if this is serious enough to consider hangs as a crash and nominate this PR for backport to clang-18
.
Leave me no doubt, we will have it downstream for sure, but upstream users might wanna also have this.
@Xazax-hun WDYT?
Ah, I think rushed ahead of myself. I perfed again, and it shows this: This is much better, and showcases the next slowdown bug I'm hunting for the days since I reported this one :D So, yea, this PR fixes the bug I reported in #89045, so we can close that once this is merged. |
If it really takes over an hour to analyze a file in clang-18, I'd agree that this has similar severity to a crash, and I support backporting the fix. |
Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 4 overloaded variants under this name) was handling element regions in a highly inefficient manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would produce 2^N calls. This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. Fixes llvm#89045 (cherry picked from commit ce763bf)
Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 4 overloaded variants under this name) was handling element regions in a highly inefficient manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would produce 2^N calls. This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. Fixes llvm#89045 (cherry picked from commit ce763bf)
Previously the function
(one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all
SubRegion
s -- note thatElementRegion
is a subclass ofSubRegion
.)As pointer arithmetic produces
ElementRegion
s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind #89045 (note thatsheervideo.c
does very complex pointer arithmetic).This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of
getTaintedSymbols()
calls.Fixes #89045