-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valid chars in names #6
Comments
I would allow only the characters that are reasonably needed. As stated in a mail, this "reserves" the other characters for further use in FtanML 2.0 or FtanML 3.0. |
I think we should certainly allow Unicode letters/digits rather than restricting it to Ascii. That's been the XML way for a long while, and ASCII-only would be seen as a major step backwards. Different flavours of regular expressions seem to have different character classes; I'm most familiar with the XSD flavour which is solidly Unicode-based, and where \p{L} means any Unicode letter, and \p{D} is any Unicode digit. (Incidentally I'm not sure if Scala regular expressions are fully Unicode-aware, in the sense of matching non-BMP characters; but let's at least do Unicode BMP properly. |
Sorry, that should be \p{N} for any Unicode digit. In XSD, \d is a synonym. |
Scala regex seems to be coupled to the underlying platform (so Java for our case). I don't know if Java does it right, but we should be able to find out. Also, while you're certainly right that not allowing for non-ascii characters is a step backwards, we should not forget that they can always be used inside a string here, it's just the special shorthand we discuss about. So there's less an issue here in FtanML than it would be in XML, where you have no other possibility than to comply with what is allowed for names. |
We should decide on what characters are valid in names. Currently there are 3 definitions:
The spec says [\p{L}\p{N}:$]+ (I guess \p{L} should be \p{Alpha} and \p{N} should be \p{Digit}, but I'm not absolutely sure on that)
The parser accepts [a-zA-Z][a-zA-Z0-9:]* (no letters except a-zA-Z, has to start with a letter, no $)
Some definitions use [\p{Alpha}][\p{Alpha}\p{Digit}_:]* (same as above, but all letters)
I'd like to stay with the specced version, as it gives a lot of freedom for element names (especially numeric-only names, also names starting in _ or $, which aren't uncommon in programming languages), though we should feel certain on allowing non ASCII characters then (which is probably good: it supports non-english languages - it might have some compatibilty issues, but we usually ignore those).
Also the question has been raised whether more 'special characters' should be allowed. As stated on the mailing list, I think we could allow for any of the following:
!#%&*+,/;?@^~
The text was updated successfully, but these errors were encountered: