Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Don't split "DPI"/"DPI-C" in Verilog imports #66951

Merged
merged 4 commits into from
Sep 21, 2023

Conversation

aeubanks
Copy link
Contributor

The spec doesn't allow splitting these strings and we're seeing compile issues with splitting it.

String splitting was enabled for Verilog in https://reviews.llvm.org/D154093.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2023

@llvm/pr-subscribers-clang-format

Changes

The spec doesn't allow splitting these strings and we're seeing compile issues with splitting it.

String splitting was enabled for Verilog in https://reviews.llvm.org/D154093.


Full diff: https://github.com/llvm/llvm-project/pull/66951.diff

2 Files Affected:

  • (modified) clang/lib/Format/ContinuationIndenter.cpp (+8)
  • (modified) clang/unittests/Format/FormatTestVerilog.cpp (+6)
diff --git a/clang/lib/Format/ContinuationIndenter.cpp b/clang/lib/Format/ContinuationIndenter.cpp
index deb3e554fdc124b..0bdf339d8df5827 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -2270,7 +2270,15 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
     if (State.Stack.back().IsInsideObjCArrayLiteral)
       return nullptr;
 
+    // The "DPI" (or "DPI-C") in SystemVerilog direct programming interface
+    // imports cannot be split, e.g.
+    // `import "DPI" function foo();`
+    // FIXME: We should see if this is an import statement instead of hardcoding
+    // "DPI"/"DPI-C".
     StringRef Text = Current.TokenText;
+    if (Style.isVerilog() && (Text == "\"DPI\"" || Text == "\"DPI-C\""))
+      return nullptr;
+
     // We need this to address the case where there is an unbreakable tail only
     // if certain other formatting decisions have been taken. The
     // UnbreakableTailLength of Current is an overapproximation in that case and
diff --git a/clang/unittests/Format/FormatTestVerilog.cpp b/clang/unittests/Format/FormatTestVerilog.cpp
index 945e06143ccc3f1..56a8d19a31e919c 100644
--- a/clang/unittests/Format/FormatTestVerilog.cpp
+++ b/clang/unittests/Format/FormatTestVerilog.cpp
@@ -1253,6 +1253,12 @@ TEST_F(FormatTestVerilog, StringLiteral) {
    "xxxx"});)",
                R"(x({"xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx ", "xxxx"});)",
                getStyleWithColumns(getDefaultStyle(), 23));
+  // "DPI"/"DPI-C" in imports cannot be split.
+  verifyFormat(R"(import
+    "DPI-C" function t foo
+    ();)",
+               R"(import "DPI-C" function t foo();)",
+               getStyleWithColumns(getDefaultStyle(), 23));
   // These kinds of strings don't exist in Verilog.
   verifyNoCrash(R"(x(@"xxxxxxxxxxxxxxxx xxxx");)",
                 getStyleWithColumns(getDefaultStyle(), 23));

@aeubanks aeubanks changed the title [clang-format] Don't split "DPI"/"DPI-C" in imports [clang-format] Don't split "DPI"/"DPI-C" in Verilog imports Sep 20, 2023
@aeubanks aeubanks requested review from sstwcw and owenca September 20, 2023 20:49
Copy link
Contributor

@alexfh alexfh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once the comment is addressed.

StringRef Text = Current.TokenText;
if (Style.isVerilog() && (Text == "\"DPI\"" || Text == "\"DPI-C\""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd address the FIXME right away. Something like this:

      if (Style.isVerilog()) {
        const FormatToken *Prev = Current.getPreviousNonComment();
        if (Prev && Prev == State.Line->getFirstNonComment() &&
            Prev->TokenText == "import") {
          return nullptr;
        }
      }

@owenca
Copy link
Contributor

owenca commented Sep 21, 2023

Please wait for @sstwcw. IMO it would be better to disable splitting string literals after import.

The spec doesn't allow splitting these strings and we're seeing compile issues with splitting it.

String splitting was enabled for Verilog in https://reviews.llvm.org/D154093.
Comment on lines 2273 to 2284
// The "DPI"/"DPI-C" in SystemVerilog direct programming interface imports
// cannot be split, e.g.
// `import "DPI" function foo();`
StringRef Text = Current.TokenText;
if (Style.isVerilog()) {
const FormatToken *Prev = Current.getPreviousNonComment();
if (Prev && Prev == State.Line->getFirstNonComment() &&
Prev->TokenText == "import") {
return nullptr;
}
}

Copy link
Contributor

@owenca owenca Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The "DPI"/"DPI-C" in SystemVerilog direct programming interface imports
// cannot be split, e.g.
// `import "DPI" function foo();`
StringRef Text = Current.TokenText;
if (Style.isVerilog()) {
const FormatToken *Prev = Current.getPreviousNonComment();
if (Prev && Prev == State.Line->getFirstNonComment() &&
Prev->TokenText == "import") {
return nullptr;
}
}
if (Style.isVerilog() && Current.Previous &&
Current.Previous->isOneOf(tok::kw_export, Keywords.kw_import)) {
return nullptr;
}
StringRef Text = Current.TokenText;

Shouldn't we handle export as well? Also, I don't think this is Verilog specific.

Edit: let’s just fix Verilog for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOneOf won't work here, since the token has the type of identifier rather than a keyword:

Unwrapped lines:
Line(0, FSC=0): identifier[T=125, OC=0, "import"] string_literal[T=125, OC=7, ""DPI-C""] identifier[T=125, OC=15, "function"] identifier[T=125, OC=24, "t"] identifier[T=125, OC=26, "foo"]
Line(0, FSC=0): l_paren[T=120, OC=29, "("] r_paren[T=125, OC=30, ")"] semi[T=125, OC=31, ";"]
Line(1, FSC=0): eof[T=125, OC=32, ""]
Run 0...
AnnotatedTokens(L=0, P=0, T=5, C=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=identifier L=6 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc60ce8 Text='import'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=string_literal L=14 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='"DPI-C"'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=1 P=23 Name=identifier L=23 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc611f8 Text='function'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=25 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc9b370 Text='t'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=29 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc9b3a0 Text='foo'
----
AnnotatedTokens(L=0, P=0, T=5, C=1):
 M=0 C=0 T=VerilogMultiLineListLParen S=1 F=0 B=0 BK=0 P=0 Name=l_paren L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=140 Name=r_paren L=2 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=1, P=0, T=5, C=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----

We should leave Current.getPreviousNonComment() to handle comments in the middle of the statement (something like import /*"DPI"*/ "DPI-C" ... comes to mind).

Shouldn't we handle export as well?

Indeed, export "DPI-C" is also a valid construct, and thus, string literals after export should be exempt from breaking too.

I don't think this is Verilog specific.

I'd suggest to address known cases with targeted exemptions to avoid surprises in random places.

      if (Style.isVerilog()) {
        const FormatToken *Prev = Current.getPreviousNonComment();
        if (Prev && Prev == State.Line->getFirstNonComment() &&
            (Prev->TokenText == "import" || Prev->TokenText == "export")) {
          return nullptr;
        }
      }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOneOf won't work here, since the token has the type of identifier rather than a keyword:

Have you tried it? It does work because not only is import a tok::identifier, it's also a Keywords.kw_import.

We should leave Current.getPreviousNonComment() to handle comments in the middle of the statement (something like import /*"DPI"*/ "DPI-C" ... comes to mind).

I was aware of that, but we usually don't call getPreviousNonComment() unless a comment before a token makes sense in practice. Otherwise, we would have to write ugly and inefficient code to handle things like the following:

/* outer l_square */ [ /* inner l_square */ [ /* attribute */ unlikely /* inner r_square */ ] /* outer r_square */ ] // comment

I'd suggest to address known cases with targeted exemptions to avoid surprises in random places.

I think this is also relevant to (at least) C++ import statements, e.g.:

`import "clang/include/clang/Format/Format.h";`

I still prefer that we make a general fix here but will leave it to @sstwcw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please have any of the resolutions sooner?

This blocks quite a bit of testing.

For example, can we have this as a workaround, then let @sstwcw fix it cleanly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my update in #66951 (comment).

Copy link
Contributor

@sstwcw sstwcw Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think this is Verilog specific.

For C++, it is already handled on lines 261 and 2166. I prefer fixing the Verilog problem by annotating the import lines instead of implementing said lines again. But if you think it is too much work for you, then I am also fine with your current fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken the suggestion and added a FIXME to use the C++ import infra

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOneOf won't work here, since the token has the type of identifier rather than a keyword:

Have you tried it? It does work because not only is import a tok::identifier, it's also a Keywords.kw_import.

Ah, right, I had tried it with tok::kw_import (and I missed the difference between this and your suggestion). Thanks for the clarification!

We should leave Current.getPreviousNonComment() to handle comments in the middle of the statement (something like import /*"DPI"*/ "DPI-C" ... comes to mind).

I was aware of that, but we usually don't call getPreviousNonComment() unless a comment before a token makes sense in practice. Otherwise, we would have to write ugly and inefficient code to handle things like the following:

/* outer l_square */ [ /* inner l_square */ [ /* attribute */ unlikely /* inner r_square */ ] /* outer r_square */ ] // comment

I don't think it would add a lot of overhead (one branch on a happy path) or hinder readability of the code a lot (getPreviousNonComment() vs Prev), but I also don't think it's super important here.

@aeubanks aeubanks merged commit e0388e0 into llvm:main Sep 21, 2023
@aeubanks aeubanks deleted the formatverilog branch September 21, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants