-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(System)Verilog: escaped identifiers (LRM 5.6.1) #4129
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4129 +/- ##
=======================================
Coverage 85.90% 85.90%
=======================================
Files 239 239
Lines 58727 58733 +6
=======================================
+ Hits 50447 50453 +6
Misses 8280 8280 ☔ View full report in Codecov by Sentry. |
By the way, I also added a unit test for the feature. I don't know what's the protocol here, if I add the unit test myself to demonstrate the feature or if someone else should add it to verify that what I did works. |
The error looks strange. I will look into this. |
Which error do you mean? If you mean the CI error, I checked and it seems to be a false positive on an unrelated test I didn't touch. (CI does that sometimes.) |
38eda10
to
5dc2b1f
Compare
The CI failure has been fixed in master branch. please rebase master. |
5dc2b1f
to
987587e
Compare
Rebased and all CI checks passed 😃 Thanks! |
parsers/verilog.c
Outdated
c = vGetc (); // skip leading '\' | ||
// A single `\` would result in an empty identifier, which is unsupported. | ||
// Add the initial `\` in that case, and also in case it starts with `\\`. | ||
if (!isgraph (c) || c == '\\') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if C is EOF (== -1).
isgraph(c) may return false. Then \
will be appended to token->name
. It's not so harmful that I am afraid of. But putting something like goto end;
will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behavior is correct.
We treat the sequence \<EOF>
the same as \<whitespace>
, i.e. the token finishes right there so it's just a single "\"
. This would generate an empty identifier which ctags doesn't like, so as suggested by @roccomao we add the leading \
to that corner case, and also to all identifier names starting with \
. Note that this is only done for the first character.
If I'm interpreting the C standard, §7.4 correctly, passing EOF to isgraph()
et al is valid and would return false, since EOF is not a graph character. So EOF will be treated the same as space, whitespace, control characters, etc., which is what we want.
(PS: Note that I'm using while
and not do while
to handle this case, since the sequence of characters after the \
may be empty, whereas in the case of regular identifiers there's one or more characters, hence do while
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
First, thank you for your great contribution.
I doubt that this specification is correct.
LRM just says:
It is not clear an empty identifier is allowed.
This format can support an empty identifier "theoretically". Even if it were not currently supported, it would not be worth the bother to support it, I think. |
Well, it's what the standard says; the I can remove that part from the commit message if you prefer.
Not in that paragraph, but from the LRM,1 section A.9.3:
with { } meaning zero or more. If they meant one or more, they would've put an extra character explicitly (they do so e.g. in the definition of
But
Yeah, definitely not. As I said, I don't think ctags must encode exactly what the standard says the identifier is, and even if it could, I imagine it would cause all sorts of trouble. So adding the Footnotes
|
OK. It is clear an empty identifier is allowed.
I don't understand what you mean, here...
I meant; localparam \\ = \ ; this line should emit;
instead of (the line 6 of
How can we explain that the latter follows the LRM? |
You can make a tag with an empty string when you set |
The issue solved in this pull request is more complex than I assumed.
This is indeed not in accordance with LRM. But let me explain the previous suggestion in detail and sort out the different opinions at first. Then determine how should we make a trade-off? Because of the ambiguity of LRM, the code in my previous comment was actually verified in the QuestaSim and it works fine. I haven't tested other compilers, but this should give us a reference. Because we don't care what its official name is, we just give each identifier a tag name, too. Firstly, This is an example given by the LRM 5.6.1 chapter. The document only describes that escaped keyword is an exception.
The compiler does the opposite. It does not consider keywords. It only looks at whether there are special characters in the identifier to determine whether there should have a leading
Next, let's look at the empty identifier (a single
So, finally, how do we create tag names?
|
My bad; I misunderstood what you meant, but I get you now. As @roccomao already mentioned, the addition of the extra The options I see to handle escaped identifiers are:
Option 1 is the most strictly compliant with the LRM "semantic" definition of an identifier, and option 5 is technically also what the LRM defines as an If, as @masatake said, setting allowNullTag is an option (and doesn't involve changing anything outside of the Verilog parser and can be enabled for Verilog only), then I'd say that (1.) is the best approach. |
@roccomao and @cousteaulecommandant, Thank you for your detailed explanation.
Could you try "1." with the following patch? diff --git a/parsers/verilog.c b/parsers/verilog.c
index 7e69640ef..754c16026 100644
--- a/parsers/verilog.c
+++ b/parsers/verilog.c
@@ -2192,6 +2192,7 @@ extern parserDefinition* VerilogParser (void)
def->parser = findVerilogTags;
def->initialize = initializeVerilog;
def->selectLanguage = selectors;
+ def->allowNullTag = true;
return def;
}
@@ -2208,5 +2209,6 @@ extern parserDefinition* SystemVerilogParser (void)
def->extensions = extensions;
def->parser = findVerilogTags;
def->initialize = initializeSystemVerilog;
+ def->allowNullTag = true;
return def;
} The current |
If you want, I will add this change.
|
@masatake san, Thank you. But I don't think the verilog parser does not need this. I far as I think of, the escaped identifier which we are discussing in this thread is the only tag which can be Null in Verilog and SystemVerilog. We don't have to specify it for each tag. |
I tested it with a null tag and it didn't create an entry for it. Is that expected? (It did suppress the warning message though; without @hirooih's patch it displays the warning.) @@ -2,11 +2,10 @@ themodule#(x=42) input.v /^module \\themodule#(x=42) ($/;" m
clk,rst, input.v /^ input wire \\clk,rst, , \\d ,$/;" p module:themodule#(x=42)
d input.v /^ input wire \\clk,rst, , \\d ,$/;" p module:themodule#(x=42)
1+1=2 input.v /^ output reg \\1+1=2$/;" p module:themodule#(x=42)
-\\ input.v /^localparam \\ = 1;$/;" c module:themodule#(x=42)
-\\\\ input.v /^localparam \\\\ = \\ ;$/;" c module:themodule#(x=42)
-\\\\\\ input.v /^localparam \\\\\\ = \\\\ ;$/;" c module:themodule#(x=42)
+\\ input.v /^localparam \\\\ = \\ ;$/;" c module:themodule#(x=42)
+\\\\ input.v /^localparam \\\\\\ = \\\\ ;$/;" c module:themodule#(x=42)
r\\n input.v /^localparam \\r\\n = \\\\\\ ;$/;" c module:themodule#(x=42)
-\\\\r\\n input.v /^localparam \\\\r\\n = \\r\\n ;$/;" c module:themodule#(x=42)
-\\\\\\r\\n input.v /^localparam \\\\\\r\\n = \\\\r\\n ;$/;" c module:themodule#(x=42)
+\\r\\n input.v /^localparam \\\\r\\n = \\r\\n ;$/;" c module:themodule#(x=42)
+\\\\r\\n input.v /^localparam \\\\\\r\\n = \\\\r\\n ;$/;" c module:themodule#(x=42)
end input.v /^always @(posedge \\clk,rst, ) begin : \\end$/;" b module:themodule#(x=42)
-\\\\end input.v /^ if (\\\\\\r\\n ) begin : \\\\end$/;" b block:themodule#(x=42).end
+\\end input.v /^ if (\\\\\\r\\n ) begin : \\\\end$/;" b block:themodule#(x=42).end If this is the expected behavior I'll push it. (PS: please don't pay attention to how I accidentally closed and reopened this PR.) |
In fact, is {
"": "empty",
"x": "nonempty"
} only yields:
|
I misunderstood. Ctags cannot make an empty tag. |
Don't worry :) So what do we do then? Option 1 is not available. Shall we go for option 2? That means not creating tags for the null identifier, i.e., there is one identifier ctags will ignore. This can be done via the allowNullTag option (the name is misleading; should be "tagMayBeNull"), explicitly in the parsing code (if identifier is empty, return without creating a tag) or simply trigger a warning because we want to inform the user that a tag was dropped. If we do want to include a tag for |
I think this is the way to go.
I think
I think this is better, but just ignoring is enough. And I hope someone will implement the true |
Perhaps, yeah.
I mentioned this because triggering a warning was the behavior you got in my original commit, since the default behavior for ctags is to display the message "Notice: ignoring null tag". If I remove the
This could be useful for a variety of languages. JSON is currently the only one that enables this, but e.g. Tcl allows empty variable/array/function names. (No idea how this would be handled internally, but I'll let the gurus of ctags figure out.) :) |
I see. Let's remove it! |
Do I remove the |
3fd3c3c
to
fb0084d
Compare
Add support for escaped identifiers in Verilog and SystemVerilog: `\` + zero or more non-whitespace characters + whitespace. Note that the `\` itself isn't part of the identifier, and that `\foo` is the same as just `foo` (unlike in VHDL), but that identifiers identical to keywords such as `\begin` are also allowed. This definition also theoretically allows an empty identifier, `\`, which ctags currently ignores, issuing a notice. Escaped identifiers are defined in the standard in section 3.7.1 (Verilog) / 5.6.1 (SystemVerilog).
fb0084d
to
040f96c
Compare
I have two concerns:
libreadtags skips such lines at
The line comparing with
u-ctags supports etags and xref. Do they consider null tags? Could you open a separate issue to support a null tag? |
This is regarding the idea of handling null tags as a future addition and not something blocking this PR I need to handle right now, right?
I suppose this was addressed at @hirooih, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks a lot!
parsers/verilog.c
Outdated
c = vGetc (); // skip leading '\' | ||
// A single `\` would result in an empty identifier, which is unsupported. | ||
// Add the initial `\` in that case, and also in case it starts with `\\`. | ||
if (!isgraph (c) || c == '\\') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5.6.1 Escaped identifiers
any of the printable ASCII characters in an identifier (the decimal values 33 through 126, or 21 through 7E in hexadecimal).
3 The term printing character refers to a member of a locale-specific set of characters, each of which occupies one printing position on a display device;
...
7.4.1.6 [The isgraph function]
,,,
2 The isgraph function tests for any printing character except space (' ').
Considering locales excepting C, I think isgraph()
can not be used here.
I've merged this PR. Again thank you very much! |
Nice! And thank you for all the feedback :) |
Add support for escaped identifiers in Verilog and SystemVerilog:
\
+ zero or more non-whitespace characters + whitespace.Note that the
\
itself isn't part of the identifier, and that\foo
is the same as justfoo
(unlike in VHDL), but that identifiers identical to keywords such as\begin
are also allowed.This definition also theoretically allows an empty identifier,
\
, which ctags doesn't support.To avoid issues, empty identifiers (\
) or identifiers that start with (a second)\
keep the leading\
even if it's technically not part of the name.Escaped identifiers are defined in the standard in section 3.7.1 (Verilog) / 5.6.1 (SystemVerilog).
Support for escaped identifiers was listed in the Verilog TODO list in #2674.