-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] the grammar comments are not always correct #71
Comments
Excellent experiment. It brings even more insights. |
Isn't it desirable to leave the builtin types outside of the reserved words of the grammar? They are identifiers after all, just ones that the user didn't define somewhere else. |
Officially they are keywords, we should not treat them as regular identifiers. Otherwise you can define a function called "int". And they behave a bit strange anyway, for example "unsigned long int" and "long unsigned int" are both valid (and in fact identical). I think that has to be done explicitly, we cannot pretend that these behave like regular identifiers. |
Ok yeah this sold it to me. Even if only one of those two combinations were allowed, Unless one were to remove all those keywords anyway and offer only one-word builtins, Giving a function the same name as a type is equally undesirable for builtin or user-defined types. It's a problem of name collision / shadowing, which in my mind is a whole different category from |
I would vote for that. The same rule for all types. No special cases. Or we treat |
But that breaks compatibility with existing C++ code. Note that, e.g., |
It wouldn't be hard to convince me that leaving this craziness behind is a worthwhile goal, even without the issue of multiword types we were discussing. I've been bitten by Can still provide them for compatibility without making them the path of least resistance. E.g. by offering |
In what context is a multi-identifier type a problem? I think they might be OK in declarations, thanks to the |
From a parsing perspective it is not a problem, it is just that the rules for fundamental types are a bit funny. For example |
Normalizing this is not a problem for |
But this could be a nice feature for the only allowed cpp2 code. Without the flag, backwards compatibility is not broken. With the flag, welcome to the real modern C++! |
but I think in the long run that is not sufficient. |
I understand that. And you mentioned that parsing a multi-identifier type is not a problem. So I take it that this was a digression from the main issue. |
This was all about one specific point of the many @neumannt discovered with his reimplementation of the parser: "We want to accept builtin types like int as type ids. Currently this works by accident because the parser does not even recognize these as keywords." I suggested that what can be considered an accident of the current implementation (builtin types aren't reserved keywords), feels to me like should be a feature. |
Thanks! I've adopted some of the changes, particularly production ordering. I haven't changed the grammar to disallow relational operators outside Re fundamental types, I don't intend to support multi-token names like Again, thanks! |
Multi-token fundamental types are now supported: 966856f.
|
@JohelEGP Thanks for mentioning that on this thread too. I wasn't going to support the multi-token names, but once I thought of a simple solution, I did support them because it's important to avoid any compatibility friction with today's syntax where it doesn't compromise Cpp2's design (and this doesn't, we don't have to encourage using them in Cpp2 even though they happen to work). Speaking of which, if we don't encourage those, what should we encourage? I wrote above...
... and I just pushed that support in commit 4d9d52d. Quoting the commit:
|
On second thought, the important thing is to support a way of uttering the types that are spelled multi-word in C/C++ today, but I can do that just by providing type aliases. Then Cpp2 code visibly uses normal single-token names throughout, while still having full interop compatibility with all the non-fixed-width C/C++ types. So I've tweaked the approach: I'm leaving in the code that handles the multi-word names in case a programmer does try to write them (which many will because of familiarity), but instead of merging them and making them work as I did last week, I'll reject them with a hopefully-nice diagnostic directing the programmer to use the nicer names instead (and giving extra discouragement for the really pernicious ones like Checked in yesterday in b972313, improved diagnostic today with 98ae2b9. Here's an example diagnostic:
I decided to make the names for explicitly-signed/unsigned |
Out of curiosity I have an implemented an alternative parser for cppfront / cpp2, which uses a PEG grammar as input for a parser generator. During that experiment, I noticed that the grammar rules embedded as
//G
comments are not always correct. I will list errors that I noticed below.One preliminary note: The cppfront compiler has a rather relaxed concept of keywords. In most cases it will accept a keyword were an identifier is expected, for example it will happily compile
if: () -> void = { }
. I don't think that is a good idea, my grammar explicitly distinguishes between keywords and identifiers. (Modulo the few context specific soft-keywords likein
/out
etc.). For some grammar rules that requires changes were the parser previously worked by accident (i.e, by not recognizing a certain keyword).a) id_expression
here the order is wrong, it should be
b) primary_expression
this does not correspond to the source code order. Furthermore, the expression-list is optional. And if we distinguish keywords from literals we potentially need some extra rules to handle keywords that are currently silently eaten as identifier. I would suggest
c) nested-name-specifier
this has to support nested scopes. I would suggest
d) template-argument
There should be a comment here that we disable '<'/'>'/'<<'/'>>' in the expressions until a new parentheses is opened. In fact that causes some of the expression rules to be cloned until we reach the level below these operators. (In my implementation these are the rules with suffix _no_cmp).
e) id-expression from fundamental types
We want to accept builtin types like
int
as type ids. Currently this works by accident because the parser does not even recognize these as keywords. When enforcing that keywords are not identifiers we need rules for these, too. I have added afundamental-type
alternative at the end of id-expression, and have defines that as follows:The text was updated successfully, but these errors were encountered: