-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-format] Don't split "DPI"/"DPI-C" in Verilog imports #66951
Conversation
@llvm/pr-subscribers-clang-format ChangesThe spec doesn't allow splitting these strings and we're seeing compile issues with splitting it. String splitting was enabled for Verilog in https://reviews.llvm.org/D154093. Full diff: https://github.com/llvm/llvm-project/pull/66951.diff 2 Files Affected:
diff --git a/clang/lib/Format/ContinuationIndenter.cpp b/clang/lib/Format/ContinuationIndenter.cpp
index deb3e554fdc124b..0bdf339d8df5827 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -2270,7 +2270,15 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
if (State.Stack.back().IsInsideObjCArrayLiteral)
return nullptr;
+ // The "DPI" (or "DPI-C") in SystemVerilog direct programming interface
+ // imports cannot be split, e.g.
+ // `import "DPI" function foo();`
+ // FIXME: We should see if this is an import statement instead of hardcoding
+ // "DPI"/"DPI-C".
StringRef Text = Current.TokenText;
+ if (Style.isVerilog() && (Text == "\"DPI\"" || Text == "\"DPI-C\""))
+ return nullptr;
+
// We need this to address the case where there is an unbreakable tail only
// if certain other formatting decisions have been taken. The
// UnbreakableTailLength of Current is an overapproximation in that case and
diff --git a/clang/unittests/Format/FormatTestVerilog.cpp b/clang/unittests/Format/FormatTestVerilog.cpp
index 945e06143ccc3f1..56a8d19a31e919c 100644
--- a/clang/unittests/Format/FormatTestVerilog.cpp
+++ b/clang/unittests/Format/FormatTestVerilog.cpp
@@ -1253,6 +1253,12 @@ TEST_F(FormatTestVerilog, StringLiteral) {
"xxxx"});)",
R"(x({"xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx ", "xxxx"});)",
getStyleWithColumns(getDefaultStyle(), 23));
+ // "DPI"/"DPI-C" in imports cannot be split.
+ verifyFormat(R"(import
+ "DPI-C" function t foo
+ ();)",
+ R"(import "DPI-C" function t foo();)",
+ getStyleWithColumns(getDefaultStyle(), 23));
// These kinds of strings don't exist in Verilog.
verifyNoCrash(R"(x(@"xxxxxxxxxxxxxxxx xxxx");)",
getStyleWithColumns(getDefaultStyle(), 23));
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good once the comment is addressed.
StringRef Text = Current.TokenText; | ||
if (Style.isVerilog() && (Text == "\"DPI\"" || Text == "\"DPI-C\"")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd address the FIXME right away. Something like this:
if (Style.isVerilog()) {
const FormatToken *Prev = Current.getPreviousNonComment();
if (Prev && Prev == State.Line->getFirstNonComment() &&
Prev->TokenText == "import") {
return nullptr;
}
}
Please wait for @sstwcw. IMO it would be better to disable splitting string literals after |
The spec doesn't allow splitting these strings and we're seeing compile issues with splitting it. String splitting was enabled for Verilog in https://reviews.llvm.org/D154093.
b526053
to
23d1b3c
Compare
// The "DPI"/"DPI-C" in SystemVerilog direct programming interface imports | ||
// cannot be split, e.g. | ||
// `import "DPI" function foo();` | ||
StringRef Text = Current.TokenText; | ||
if (Style.isVerilog()) { | ||
const FormatToken *Prev = Current.getPreviousNonComment(); | ||
if (Prev && Prev == State.Line->getFirstNonComment() && | ||
Prev->TokenText == "import") { | ||
return nullptr; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The "DPI"/"DPI-C" in SystemVerilog direct programming interface imports | |
// cannot be split, e.g. | |
// `import "DPI" function foo();` | |
StringRef Text = Current.TokenText; | |
if (Style.isVerilog()) { | |
const FormatToken *Prev = Current.getPreviousNonComment(); | |
if (Prev && Prev == State.Line->getFirstNonComment() && | |
Prev->TokenText == "import") { | |
return nullptr; | |
} | |
} | |
if (Style.isVerilog() && Current.Previous && | |
Current.Previous->isOneOf(tok::kw_export, Keywords.kw_import)) { | |
return nullptr; | |
} | |
StringRef Text = Current.TokenText; |
Shouldn't we handle export
as well? Also, I don't think this is Verilog specific.
Edit: let’s just fix Verilog for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isOneOf
won't work here, since the token has the type of identifier
rather than a keyword:
Unwrapped lines:
Line(0, FSC=0): identifier[T=125, OC=0, "import"] string_literal[T=125, OC=7, ""DPI-C""] identifier[T=125, OC=15, "function"] identifier[T=125, OC=24, "t"] identifier[T=125, OC=26, "foo"]
Line(0, FSC=0): l_paren[T=120, OC=29, "("] r_paren[T=125, OC=30, ")"] semi[T=125, OC=31, ";"]
Line(1, FSC=0): eof[T=125, OC=32, ""]
Run 0...
AnnotatedTokens(L=0, P=0, T=5, C=0):
M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=identifier L=6 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc60ce8 Text='import'
M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=string_literal L=14 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='"DPI-C"'
M=0 C=0 T=Unknown S=1 F=0 B=0 BK=1 P=23 Name=identifier L=23 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc611f8 Text='function'
M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=25 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc9b370 Text='t'
M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=identifier L=29 PPK=2 FakeLParens= FakeRParens=0 II=0x56180dc9b3a0 Text='foo'
----
AnnotatedTokens(L=0, P=0, T=5, C=1):
M=0 C=0 T=VerilogMultiLineListLParen S=1 F=0 B=0 BK=0 P=0 Name=l_paren L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=140 Name=r_paren L=2 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=1, P=0, T=5, C=0):
M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
We should leave Current.getPreviousNonComment()
to handle comments in the middle of the statement (something like import /*"DPI"*/ "DPI-C" ...
comes to mind).
Shouldn't we handle export as well?
Indeed, export "DPI-C"
is also a valid construct, and thus, string literals after export
should be exempt from breaking too.
I don't think this is Verilog specific.
I'd suggest to address known cases with targeted exemptions to avoid surprises in random places.
if (Style.isVerilog()) {
const FormatToken *Prev = Current.getPreviousNonComment();
if (Prev && Prev == State.Line->getFirstNonComment() &&
(Prev->TokenText == "import" || Prev->TokenText == "export")) {
return nullptr;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isOneOf
won't work here, since the token has the type ofidentifier
rather than a keyword:
Have you tried it? It does work because not only is import
a tok::identifier
, it's also a Keywords.kw_import
.
We should leave
Current.getPreviousNonComment()
to handle comments in the middle of the statement (something likeimport /*"DPI"*/ "DPI-C" ...
comes to mind).
I was aware of that, but we usually don't call getPreviousNonComment()
unless a comment before a token makes sense in practice. Otherwise, we would have to write ugly and inefficient code to handle things like the following:
/* outer l_square */ [ /* inner l_square */ [ /* attribute */ unlikely /* inner r_square */ ] /* outer r_square */ ] // comment
I'd suggest to address known cases with targeted exemptions to avoid surprises in random places.
I think this is also relevant to (at least) C++ import statements, e.g.:
`import "clang/include/clang/Format/Format.h";`
I still prefer that we make a general fix here but will leave it to @sstwcw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please have any of the resolutions sooner?
This blocks quite a bit of testing.
For example, can we have this as a workaround, then let @sstwcw fix it cleanly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my update in #66951 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think this is Verilog specific.
For C++, it is already handled on lines 261 and 2166. I prefer fixing the Verilog problem by annotating the import lines instead of implementing said lines again. But if you think it is too much work for you, then I am also fine with your current fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken the suggestion and added a FIXME to use the C++ import infra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isOneOf
won't work here, since the token has the type ofidentifier
rather than a keyword:Have you tried it? It does work because not only is
import
atok::identifier
, it's also aKeywords.kw_import
.
Ah, right, I had tried it with tok::kw_import
(and I missed the difference between this and your suggestion). Thanks for the clarification!
We should leave
Current.getPreviousNonComment()
to handle comments in the middle of the statement (something likeimport /*"DPI"*/ "DPI-C" ...
comes to mind).I was aware of that, but we usually don't call
getPreviousNonComment()
unless a comment before a token makes sense in practice. Otherwise, we would have to write ugly and inefficient code to handle things like the following:/* outer l_square */ [ /* inner l_square */ [ /* attribute */ unlikely /* inner r_square */ ] /* outer r_square */ ] // comment
I don't think it would add a lot of overhead (one branch on a happy path) or hinder readability of the code a lot (getPreviousNonComment()
vs Prev
), but I also don't think it's super important here.
The spec doesn't allow splitting these strings and we're seeing compile issues with splitting it.
String splitting was enabled for Verilog in https://reviews.llvm.org/D154093.