-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: honor the unicode classes for identifiers #12483
Comments
It does use them; see the code at label talph. There's one bug in that leading non-ASCII Unicode digits are not rejected, but that's a separate issue and I have a CL forthcoming. |
Looks like CL 16919 but that didn't reference this issue. This issue was triggered by a public post (stack overflow??) that had an example I should have included. There should probably be tests that the compiler gets this right. It's clear it didn't before. |
For the purposes of lexing byte-at-a-time, all multibyte sequences are tentatively alpha. Then we filter once we've parsed the runes. We've always* done that. I know the comment makes it sound like what the Plan 9 C compiler does, but it's really not. This is from Go 1.1 (just to show that the behavior has been this way for a long time):
That's all the possible ways to start an identifier, leading to the talph label. Then at the label:
So any multibyte non-alphanumeric will end up at talph and then be rejected with a message about that being an invalid character for an identifier (probably the best possible message, although strictly speaking it's making an assumption; maybe the user didn't intend the non-alphanumeric as part of an identifier). The only bug in the code (that I found) was that leading non-ASCII digits were allowed (#11359). I closed this issue without a CL because I don't see any other problems. There is now a test for leading non-ASCII digits, as part of the CL for #11359. The current Go version of the talph block is:
There is also a test (test/fixedbugs/bug163.go):
I'd be happy to look again given a specific test case that is incorrectly accepted.
|
This program works and it should not. package main
func main() {
လ := 3
_ = လ
} |
http://play.golang.org/p/kUuxyPC4qw says that 'လ' is a letter. 'လ' is 101C which is Myanmar Letter LA |
The code currently says (lex.go):
Now that the compiler is in Go, we have access to the unicode tables and should use them.
The text was updated successfully, but these errors were encountered: