Valid chars in names #6

errt · 2012-10-01T22:38:15Z

We should decide on what characters are valid in names. Currently there are 3 definitions:
The spec says [\p{L}\p{N}:$]+ (I guess \p{L} should be \p{Alpha} and \p{N} should be \p{Digit}, but I'm not absolutely sure on that)
The parser accepts [a-zA-Z][a-zA-Z0-9:]* (no letters except a-zA-Z, has to start with a letter, no $)
Some definitions use [\p{Alpha}][\p{Alpha}\p{Digit}_:]* (same as above, but all letters)

I'd like to stay with the specced version, as it gives a lot of freedom for element names (especially numeric-only names, also names starting in _ or $, which aren't uncommon in programming languages), though we should feel certain on allowing non ASCII characters then (which is probably good: it supports non-english languages - it might have some compatibilty issues, but we usually ignore those).

Also the question has been raised whether more 'special characters' should be allowed. As stated on the mailing list, I think we could allow for any of the following:
!#%&*+,/;?@^~

smessmer · 2012-10-02T22:03:42Z

I would allow only the characters that are reasonably needed. As stated in a mail, this "reserves" the other characters for further use in FtanML 2.0 or FtanML 3.0.
So I'd allow [a-zA-Z0-9:_] the special characters we need for our type system. They should be enough for any user to fill it's own need for special characters (for example if they want to define a type system themselves).
I wouldn't allow [0-9] as the first digit, because it could lead to some trouble in future, distinguishing them from number values.

michaelhkay · 2012-10-02T22:31:16Z

I think we should certainly allow Unicode letters/digits rather than restricting it to Ascii. That's been the XML way for a long while, and ASCII-only would be seen as a major step backwards. Different flavours of regular expressions seem to have different character classes; I'm most familiar with the XSD flavour which is solidly Unicode-based, and where \p{L} means any Unicode letter, and \p{D} is any Unicode digit. (Incidentally I'm not sure if Scala regular expressions are fully Unicode-aware, in the sense of matching non-BMP characters; but let's at least do Unicode BMP properly.

michaelhkay · 2012-10-02T22:32:25Z

Sorry, that should be \p{N} for any Unicode digit. In XSD, \d is a synonym.

errt · 2012-10-02T22:40:58Z

Scala regex seems to be coupled to the underlying platform (so Java for our case). I don't know if Java does it right, but we should be able to find out. Also, while you're certainly right that not allowing for non-ascii characters is a step backwards, we should not forget that they can always be used inside a string here, it's just the special shorthand we discuss about. So there's less an issue here in FtanML than it would be in XML, where you have no other possibility than to comply with what is allowed for names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valid chars in names #6

Valid chars in names #6

errt commented Oct 1, 2012

smessmer commented Oct 2, 2012

michaelhkay commented Oct 2, 2012

michaelhkay commented Oct 2, 2012

errt commented Oct 2, 2012

Valid chars in names #6

Valid chars in names #6

Comments

errt commented Oct 1, 2012

smessmer commented Oct 2, 2012

michaelhkay commented Oct 2, 2012

michaelhkay commented Oct 2, 2012

errt commented Oct 2, 2012