Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][Sema] Make format size estimator aware of %p's existence in format string #65969

Merged
merged 4 commits into from
Sep 25, 2023

Conversation

hazohelet
Copy link
Member

GCC stops counting format string's size when it sees %p format in order to avoid Wformat-truncation false positive against Linux kernel's format extension (%pOF, for example).
This change makes clang's behavior align with GCC's.
As requested in #64871

@hazohelet hazohelet added clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer labels Sep 11, 2023
@hazohelet hazohelet requested a review from a team as a code owner September 11, 2023 15:54
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang

Changes

GCC stops counting format string's size when it sees %p format in order to avoid Wformat-truncation false positive against Linux kernel's format extension (%pOF, for example).
This change makes clang's behavior align with GCC's.
As requested in #64871

--
Full diff: https://github.com/llvm/llvm-project/pull/65969.diff

4 Files Affected:

  • (modified) clang/include/clang/AST/FormatString.h (+6)
  • (modified) clang/lib/AST/PrintfFormatString.cpp (+7-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+10)
  • (modified) clang/test/Sema/warn-fortify-source.c (+13-3)
diff --git a/clang/include/clang/AST/FormatString.h b/clang/include/clang/AST/FormatString.h
index 5c4ad9baaef608c..97ce3cfe0b25c68 100644
--- a/clang/include/clang/AST/FormatString.h
+++ b/clang/include/clang/AST/FormatString.h
@@ -742,6 +742,12 @@ class FormatStringHandler {
     return true;
   }
 
+  virtual bool
+  ShouldStopOnFormatSpecifier(const analyze_printf::PrintfSpecifier &FS,
+                              unsigned RemainLen) {
+    return false;
+  }
+
   /// Handle mask types whose sizes are not between one and eight bytes.
   virtual void handleInvalidMaskType(StringRef MaskType) {}
 
diff --git a/clang/lib/AST/PrintfFormatString.cpp b/clang/lib/AST/PrintfFormatString.cpp
index f0b9d0ecaf23461..750f584d8d37a76 100644
--- a/clang/lib/AST/PrintfFormatString.cpp
+++ b/clang/lib/AST/PrintfFormatString.cpp
@@ -429,7 +429,13 @@ bool clang::analyze_format_string::ParsePrintfString(FormatStringHandler &H,
     // we can recover from?
     if (!FSR.hasValue())
       continue;
-    // We have a format specifier.  Pass it to the callback.
+    // We have a format specifier.
+    // In case the handler needs to abort parsing upon specific specifiers
+    if (H.ShouldStopOnFormatSpecifier(
+            FSR.getValue(), static_cast(E - FSR.getStart()))) {
+      return false;
+    }
+    // Pass the format specifier to the callback.
     if (!H.HandlePrintfSpecifier(FSR.getValue(), FSR.getStart(),
                                  I - FSR.getStart(), Target))
       return true;
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 3b4ac613da76aa8..1a6ff6cfa19b661 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -860,6 +860,16 @@ class EstimateSizeFormatHandler
       : Size(std::min(Format.find(0), Format.size()) +
              1 /* null byte always written by sprintf */) {}
 
+  bool ShouldStopOnFormatSpecifier(const analyze_printf::PrintfSpecifier &FS,
+                                   unsigned RemainLen) override {
+    if (FS.getConversionSpecifier().getKind() ==
+        analyze_printf::PrintfConversionSpecifier::pArg) {
+      assert(Size >= RemainLen && "no underflow");
+      Size -= RemainLen;
+      return true;
+    }
+    return false;
+  }
   bool HandlePrintfSpecifier(const analyze_printf::PrintfSpecifier &FS,
                              const char *, unsigned SpecifierLen,
                              const TargetInfo &) override {
diff --git a/clang/test/Sema/warn-fortify-source.c b/clang/test/Sema/warn-fortify-source.c
index 5ed9782b26fb788..fba6c718be4a33a 100644
--- a/clang/test/Sema/warn-fortify-source.c
+++ b/clang/test/Sema/warn-fortify-source.c
@@ -85,7 +85,7 @@ void call_memset(void) {
   __builtin_memset(buf, 0xff, 11); // expected-warning {{'memset' will always overflow; destination buffer has size 10, but size argument is 11}}
 }
 
-void call_snprintf(double d) {
+void call_snprintf(double d, int *ptr) {
   char buf[10];
   __builtin_snprintf(buf, 10, "merp");
   __builtin_snprintf(buf, 11, "merp"); // expected-warning {{'snprintf' size argument is too large; destination buffer has size 10, but size argument is 11}}
@@ -96,6 +96,11 @@ void call_snprintf(double d) {
   __builtin_snprintf(buf, 1, "%.1000g", d); // expected-warning {{'snprintf' will always be truncated; specified size is 1, but format string expands to at least 2}}
   __builtin_snprintf(buf, 5, "%.1000g", d);
   __builtin_snprintf(buf, 5, "%.1000G", d);
+  char node_name[6];
+  __builtin_snprintf(node_name, sizeof(node_name), "%pOFn", ptr);
+  __builtin_snprintf(node_name, sizeof(node_name), "12345%pOFn", ptr);
+  __builtin_snprintf(node_name, sizeof(node_name), "123456%pOFn", ptr); // expected-warning {{'snprintf' will always be truncated; specified size is 6, but format string expands to at least 7}}
+  __builtin_snprintf(node_name, sizeof(node_name), "1234567%pOFn", ptr); // expected-warning {{'snprintf' will always be truncated; specified size is 6, but format string expands to at least 8}}
 }
 
 void call_vsnprintf(void) {
@@ -110,6 +115,11 @@ void call_vsnprintf(void) {
   __builtin_vsnprintf(buf, 1, "%.1000g", list); // expected-warning {{'vsnprintf' will always be truncated; specified size is 1, but format string expands to at least 2}}
   __builtin_vsnprintf(buf, 5, "%.1000g", list);
   __builtin_vsnprintf(buf, 5, "%.1000G", list);
+  char node_name[6];
+  __builtin_snprintf(node_name, sizeof(node_name), "%pOFn", list);
+  __builtin_snprintf(node_name, sizeof(node_name), "12345%pOFn", list);
+  __builtin_snprintf(node_name, sizeof(node_name), "123456%pOFn", list); // expected-warning {{'snprintf' will always be truncated; specified size is 6, but format string expands to at least 7}}
+  __builtin_snprintf(node_name, sizeof(node_name), "1234567%pOFn", list); // expected-warning {{'snprintf' will always be truncated; specified size is 6, but format string expands to at least 8}}
 }
 
 void call_sprintf_chk(char *buf) {
@@ -149,7 +159,7 @@ void call_sprintf_chk(char *buf) {
   __builtin___sprintf_chk(buf, 1, 4, "%#x", 9);
   __builtin___sprintf_chk(buf, 1, 3, "%#x", 9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 3, but format string expands to at least 4}}
   __builtin___sprintf_chk(buf, 1, 4, "%p", (void *)9);
-  __builtin___sprintf_chk(buf, 1, 3, "%p", (void *)9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 3, but format string expands to at least 4}}
+  __builtin___sprintf_chk(buf, 1, 3, "%p", (void *)9);
   __builtin___sprintf_chk(buf, 1, 3, "%+d", 9);
   __builtin___sprintf_chk(buf, 1, 2, "%+d", 9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 2, but format string expands to at least 3}}
   __builtin___sprintf_chk(buf, 1, 3, "% i", 9);
@@ -186,7 +196,7 @@ void call_sprintf(void) {
   sprintf(buf, "12%#x", 9);
   sprintf(buf, "123%#x", 9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 6, but format string expands to at least 7}}
   sprintf(buf, "12%p", (void *)9);
-  sprintf(buf, "123%p", (void *)9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 6, but format string expands to at least 7}}
+  sprintf(buf, "123%p", (void *)9);
   sprintf(buf, "123%+d", 9);
   sprintf(buf, "1234%+d", 9); // expected-warning {{'sprintf' will always overflow; destination buffer has size 6, but format string expands to at least 7}}
   sprintf(buf, "123% i", 9);

Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

Can we document this behavior somewhere?

clang/lib/Sema/SemaChecking.cpp Outdated Show resolved Hide resolved
clang/lib/AST/PrintfFormatString.cpp Outdated Show resolved Hide resolved
@zygoloid
Copy link
Collaborator

Is there some way we can narrow the scope of this patch so we don't lose warnings for normal snprintfs, only for the kernel one? If we really can't tell the difference from the source code, we could move the affected warnings to a different warning group instead, so the kernel can turn them off.

@hazohelet
Copy link
Member Author

Is there some way we can narrow the scope of this patch so we don't lose warnings for normal snprintfs, only for the kernel one? If we really can't tell the difference from the source code, we could move the affected warnings to a different warning group instead, so the kernel can turn them off.

Thanks for the feedback. I narrowed the scope by checking whether the next character of %p can be part of the kernel extended format specifier.
If people write %pM or a similar outside kernel, it still aborts the format size estimator and could lead to false negative. But I think it would hurt people much less because people usually put non-alphanumeric character after %p.

@nickdesaulniers
Copy link
Member

Is there some way we can narrow the scope of this patch so we don't lose warnings for normal snprintfs, only for the kernel one? If we really can't tell the difference from the source code, we could move the affected warnings to a different warning group instead, so the kernel can turn them off.

@zygoloid is this approach acceptable?

One thing I do worry about not having an explicit flag; we support clang-11+ for building the Linux kernel.

Let's say the kernel adds a new extension format specifiers %pn. We need to fix up clang to recognize that now, too. Then we fix clang in say clang-19. clang-18 still warns and we can't turn it off.

Perhaps it would be better to have the flag to disable %p without disabling the rest of -Wfortify-source, than try to keep these two in sync.

Comment on lines 167 to 169
if the succeeding character can be a part of the format specifier in linux's
format extension in order to avoid false positive warnings against the linux
code base.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps worth referring to "the Linux kernel's ..." rather than just "linux".

clang/lib/Sema/SemaChecking.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaChecking.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the wrong approach to me. I really think you should be splitting the warning into two parts, so that only the kernel developers see a change in diagnostic output. That would seem to make this patch simpler as well as avoiding a (probably unimportant) regression for all other Clang users.

clang/lib/Sema/SemaChecking.cpp Show resolved Hide resolved
@hazohelet
Copy link
Member Author

Thanks for the detailed suggestion! I agree it's the appropriate approach.

I was planning to separate Wformat-truncation from Wfortify-source sometime after this PR (as was requested in the same issue), but now that we are going to introduce a new warning group, we should finish the separation before proceeding further on this.
Sadly we don't have any way to make stacked patches as of now, so I'll merge the separation change to this PR. If reviewers think it's unreasonable, let's make another PR for the separation then.

… from Wfortify-source

Newly introduces `Wformat-overflow` and `Wformat-truncation` and imports the existing warning from `Wfortify-source` that correspond to GCC's counterpart so that they can be disabled separately from `Wfortify-source`.
…n-kprintf` for format string that contains `%p`
Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the overall approach here; please don't block on further review from me.

Comment on lines +965 to +968
def FormatOverflowNonKprintf: DiagGroup<"format-overflow-non-kprintf">;
def FormatOverflow: DiagGroup<"format-overflow", [FormatOverflowNonKprintf]>;
def FormatTruncationNonKprintf: DiagGroup<"format-truncation-non-kprintf">;
def FormatTruncation: DiagGroup<"format-truncation", [FormatTruncationNonKprintf]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be added to the FortifySource group in addition to adding them to Format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It makes sense to add this to FortifySource for backward compatibility.

clang/lib/Sema/SemaChecking.cpp Show resolved Hide resolved
@zygoloid zygoloid requested review from zygoloid and removed request for zygoloid September 13, 2023 18:59
@zygoloid zygoloid dismissed their stale review September 13, 2023 19:00

Requested changes have been made.

@nickdesaulniers
Copy link
Member

@nathanchance and I have done a bunch of testing of this revision (see comments ClangBuiltLinux/linux#1923 (comment) and down); this looks like something that will work nicely for the Linux kernel's use case! Thanks for all of the work here @hazohelet !

Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@nickdesaulniers
Copy link
Member

ping @hazohelet . This should be good to land and should help clear some red in our CI for Linux kernel builds.

@hazohelet
Copy link
Member Author

ping @hazohelet . This should be good to land and should help clear some red in our CI for Linux kernel builds.

Thanks for the ping. I was away for a while. I'm merging this after updating the PR descriptions.

@hazohelet hazohelet changed the title [clang][Sema] Stop format size estimator upon %p to adapt to linux kernel's extension [clang][Sema] Make format size estimator aware of %p's existence in format string Sep 25, 2023
@hazohelet hazohelet merged commit 56c3b8e into llvm:main Sep 25, 2023
@hazohelet hazohelet deleted the kernel-format-p branch September 25, 2023 06:33
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 2, 2023
Recently, clang added support for -Wformat-overflow and
-Wformat-truncation. When building the kernel, it was discovered that
clang's implementation of these warnings handles the '%p' specifier,
which differs from GCC's implementation. This results in false positive
warnings due to the kernel's various '%p' extensions. Fortunately, the
clang developers placed this warning difference into a separate flag,
allowing the kernel to turn off the warning for '%p' unconditionally.

This is not currently an issue for a normal build, as -Wformat-overflow
and -Wformat-truncation are unconditionally disabled, which includes
this sub-warning. However, ever since commit 6d4ab2e ("extrawarn:
enable format and stringop overflow warnings in W=1"), these warnings
are in W=1 and the goal is to enable them in the normal build once they
are all eliminated. Disable the warnings for W=1 to avoid false
positives. This block should move with -Wformat-overflow and
-Wformat-truncation when they are enabled for a normal build.

Link: ClangBuiltLinux#1923
Link: llvm/llvm-project#64871
Link: llvm/llvm-project#65969
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111219
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78512
Signed-off-by: Nathan Chancellor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants