Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid chars in names #6

Open
errt opened this issue Oct 1, 2012 · 4 comments
Open

Valid chars in names #6

errt opened this issue Oct 1, 2012 · 4 comments

Comments

@errt
Copy link
Contributor

errt commented Oct 1, 2012

We should decide on what characters are valid in names. Currently there are 3 definitions:
The spec says [\p{L}\p{N}:$]+ (I guess \p{L} should be \p{Alpha} and \p{N} should be \p{Digit}, but I'm not absolutely sure on that)
The parser accepts [a-zA-Z][a-zA-Z0-9:
]* (no letters except a-zA-Z, has to start with a letter, no $)
Some definitions use [\p{Alpha}][\p{Alpha}\p{Digit}_:]* (same as above, but all letters)

I'd like to stay with the specced version, as it gives a lot of freedom for element names (especially numeric-only names, also names starting in _ or $, which aren't uncommon in programming languages), though we should feel certain on allowing non ASCII characters then (which is probably good: it supports non-english languages - it might have some compatibilty issues, but we usually ignore those).

Also the question has been raised whether more 'special characters' should be allowed. As stated on the mailing list, I think we could allow for any of the following:
!#%&*+,/;?@^~

@smessmer
Copy link
Contributor

smessmer commented Oct 2, 2012

I would allow only the characters that are reasonably needed. As stated in a mail, this "reserves" the other characters for further use in FtanML 2.0 or FtanML 3.0.
So I'd allow [a-zA-Z0-9:_] the special characters we need for our type system. They should be enough for any user to fill it's own need for special characters (for example if they want to define a type system themselves).
I wouldn't allow [0-9] as the first digit, because it could lead to some trouble in future, distinguishing them from number values.

@michaelhkay
Copy link
Contributor

I think we should certainly allow Unicode letters/digits rather than restricting it to Ascii. That's been the XML way for a long while, and ASCII-only would be seen as a major step backwards. Different flavours of regular expressions seem to have different character classes; I'm most familiar with the XSD flavour which is solidly Unicode-based, and where \p{L} means any Unicode letter, and \p{D} is any Unicode digit. (Incidentally I'm not sure if Scala regular expressions are fully Unicode-aware, in the sense of matching non-BMP characters; but let's at least do Unicode BMP properly.

@michaelhkay
Copy link
Contributor

Sorry, that should be \p{N} for any Unicode digit. In XSD, \d is a synonym.

@errt
Copy link
Contributor Author

errt commented Oct 2, 2012

Scala regex seems to be coupled to the underlying platform (so Java for our case). I don't know if Java does it right, but we should be able to find out. Also, while you're certainly right that not allowing for non-ascii characters is a step backwards, we should not forget that they can always be used inside a string here, it's just the special shorthand we discuss about. So there's less an issue here in FtanML than it would be in XML, where you have no other possibility than to comply with what is allowed for names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants