Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unterminated string literals should be rejected by the lexer #228

Open
jefftrull opened this issue Oct 2, 2024 · 6 comments
Open

Unterminated string literals should be rejected by the lexer #228

jefftrull opened this issue Oct 2, 2024 · 6 comments

Comments

@jefftrull
Copy link
Collaborator

At the moment Wave accepts text like this:

"A
B"

which a user might accidentally write, intending to create a multi-line string literal with an embedded newline. In fact these "classic" (pre C++11) string literals disallow newlines.
Unfortunately, Wave has been turning these into a series of tokens:

<UnknownToken>   (#34 ) at /tmp/blah.c (  1/ 1): >"<
IDENTIFIER       (#380) at /tmp/blah.c (  1/ 2): >A<
NEWLINE          (#394) at /tmp/blah.c (  1/ 3): >\n<
IDENTIFIER       (#380) at /tmp/blah.c (  2/ 1): >B<
<UnknownToken>   (#34 ) at /tmp/blah.c (  2/ 2): >"<

It can then go on to do work on these tokens, like directive evaluation, as described in bug #225.

Wave should produce a lexer exception in this case instead.

@njnobles
Copy link
Contributor

njnobles commented Oct 2, 2024

Would it be possible to have a mode that still accepts multiline strings like your AB example?
Our clients' code has pervasive usage of mutli-line strings unfortunately and it'll be difficult to find and update all instances while ensuring no unexpected changes in behavior.

@jefftrull
Copy link
Collaborator Author

Maybe? But think a bit about what this means... we would have a mode whose behavior was basically undefined, as we will be unwilling to fix any issues involving the multiline strings. Your bug #225, for example. We definitely would not sign up to maintain the current behavior in any particular way beyond not throwing an exception when the mode was turned on.

I think a script to convert the multiline strings to raw string literals might be doable. You could even write something using Wave!

@jefftrull
Copy link
Collaborator Author

You might not even need raw string literals given string concatenation. Just turn:

"a
#ifdef B
b"

into

"a\n"
#ifdef B 
"b"

@njnobles
Copy link
Contributor

njnobles commented Oct 2, 2024

Unfortunately, our custom DSLs don't support string concatenation (nor C++-style raw string literals) at the moment. I can't imagine it would be too difficult to add, but it certainly complicates my issue.
I understand not providing any official support for it, but having the option would at least give my team some flexibility to update our DSLs and client code incrementally. Otherwise, we'll just have to pin ourselves to Boost 1.86.0 until we can sort it out.

@jefftrull
Copy link
Collaborator Author

I wouldn't reject the idea of maintaining a custom fork of Wave, actually. You wouldn't have to pin yourself to 1.86 overall, thanks to the Modular Boost project. And you could fix #225 your way, if you wanted to.

@hkaiser any thoughts about a super-secret disabling switch for the exception?

@hkaiser
Copy link
Collaborator

hkaiser commented Oct 3, 2024

I wouldn't reject the idea of maintaining a custom fork of Wave, actually. You wouldn't have to pin yourself to 1.86 overall, thanks to the Modular Boost project. And you could fix #225 your way, if you wanted to.

@hkaiser any thoughts about a super-secret disabling switch for the exception?

#ifndef BOOST_WAVE_DISABLE_SUPER_SECRET_EXCEPTION
    throw ...
#endif

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants