Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII identifiers should be restricted within extern blocks. #83923

Closed
crlf0710 opened this issue Apr 6, 2021 · 10 comments · Fixed by #83936
Closed

Non-ASCII identifiers should be restricted within extern blocks. #83923

crlf0710 opened this issue Apr 6, 2021 · 10 comments · Fixed by #83936
Labels
A-FFI Area: Foreign function interface (FFI) C-bug Category: This is a bug. F-non_ascii_idents `#![feature(non_ascii_idents)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@crlf0710
Copy link
Member

crlf0710 commented Apr 6, 2021

It was mentioned in rust-lang/reference#999 that in the original RFC text, there's text that said the
non-ASCII identifiers should be disallowed within extern blocks.

This is a niche use-case and i believe this doesn't need to block stablization. Creating this issue here to track it.

@crlf0710 crlf0710 added C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 6, 2021
@jonas-schievink jonas-schievink added A-FFI Area: Foreign function interface (FFI) F-non_ascii_idents `#![feature(non_ascii_idents)]` labels Apr 6, 2021
@jonas-schievink
Copy link
Contributor

What will something like

extern {
    fn ö();
}

actually link against? This seems like it might be a forward-compatiblity hazard when linking to external code, since the precise mangling/encoding isn't clear

@Manishearth
Copy link
Member

I think we should just disallow for now; and later check out what C does and match that.

@Manishearth
Copy link
Member

Manishearth commented Apr 6, 2021

@crlf0710 do you want to implement this? Should be a simple check in the parser.

@crlf0710
Copy link
Member Author

crlf0710 commented Apr 6, 2021

@Manishearth Sure, let me give it a try.

@crlf0710
Copy link
Member Author

crlf0710 commented Apr 6, 2021

Implemented in #83936 .

@tmiasko
Copy link
Contributor

tmiasko commented Apr 6, 2021

On the other hand the use of non-ASCII symbol names is already stable through link_name attribute.

@ogoffart
Copy link
Contributor

ogoffart commented Apr 6, 2021

Out of curiosity, I checked how Clang and GCC mangle the symbol: they simply use UTF-8.
MSVC doesn't seem to allow non-ascii in identifier.

extern "C" void héllo() {} // Mangle to "h\C3\A9llo"
void hélla() {} // mangle to "?h\C3\A9lla@@YAXXZ" or "_Z6h\C3\A9llav"

(Notice how the byte-length encoding in C++ is using the UTF-8 byte count, and not the code point count, as I would expect)
https://godbolt.org/z/bK8P71P46

@Manishearth
Copy link
Member

Yeah I feel like the functionality can be re-added with an FCP without much work, there just needs to be some research on how stable those mangling schemes are around this.

@Manishearth
Copy link
Member

As for link_name, my take on that is that once you're using that you're trying something weird and know what you're doing; there is no point restricting it further since it's very easy to accidently block out legit use cases.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Oct 1, 2023

Inversely, I've just noticed that despite usage of a properly-ASCII-ed link_name, if the "Rust name" is non-ASCII then it is still rejected:

extern {
    #[link_name = "cat"]
    fn ᓚᘏᗢ();
}

😿

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-FFI Area: Foreign function interface (FFI) C-bug Category: This is a bug. F-non_ascii_idents `#![feature(non_ascii_idents)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants