Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Do not include non ident chars in idents #8

Merged
merged 1 commit into from
Sep 15, 2021
Merged

[bugfix] Do not include non ident chars in idents #8

merged 1 commit into from
Sep 15, 2021

Conversation

benjreinhart
Copy link
Contributor

According to the spec, the following chars are not allowed in identifiers: \/<>{};=,". While it looks like most of those are excluded here, I noticed that the double quote " and comma , characters are being included.

Steps to reproduce:

> kdl.parse("foo,")
{
  output: [ { name: 'foo,', properties: {}, values: [], children: [] } ],
  errors: []
}

> kdl.parse("foo\"")
{
  output: [ { name: 'foo"', properties: {}, values: [], children: [] } ],
  errors: []
}

Please note that the resulting behavior after this change seems questionable. An exception is raised when a double quote is included in the identifier (which I would expect). However, when commas are present, they seem to be silently ignored. I'm not intimately familiar with the spec or implementations, so I'm not sure what the correct behavior is (though I would expect exceptions in both cases). Looking for any feedback on approach here!

@benjreinhart
Copy link
Contributor Author

benjreinhart commented Sep 15, 2021

I also noticed that under "Non-identifier characters" the spec says:

Any codepoint with hexadecimal value higher than 0x10FFFF.

However, the range here goes up to \uFFFF. Am I reading that right? Should the range here go higher to 0x10FFFF?

@benjreinhart
Copy link
Contributor Author

benjreinhart commented Sep 15, 2021

Sorry to keep sending updates here, but this also appears to be excluding more than the spec communicates. Now I'm wondering if the spec needs to be updated? For example, parenthesis are excluded here, as are square brackets [], but the spec does not mention them as being invalid for bare identifiers. Though, it seems like it wouldn't make sense for those to be in an identifier?

Wondering if @zkat can help clarify? I'm happy to update the spec if it needs it!

@larsgw
Copy link
Collaborator

larsgw commented Sep 15, 2021

Thank you for this too. I don't know how I missed the quote and the comma. Good thing this kdljs version is only released as a release candidate so far :)

However, the range here goes up to \uFFFF. Am I reading that right? Should the range here go higher to 0x10FFFF?

I think this regex should allow for those, /^[\x21-\uFFFF]+$/.test('\u{10FFFF}') returns true. I did not increase the range directly because that requires the RegExp /u flag, though I see now that has been supported for longer than I remember.

For example, parenthesis are excluded here, as are square brackets [], but the spec does not mention them as being invalid for bare identifiers. Though, it seems like it wouldn't make sense for those to be in an identifier?

It's in the grammar but not in the spec text it seems.

@benjreinhart
Copy link
Contributor Author

It's in the grammar but not in the spec text it seems.

Ahh, got it. Looks like it was in the spec but not correctly displayed.

@larsgw larsgw merged commit e5e38ef into kdl-org:main Sep 15, 2021
@benjreinhart benjreinhart deleted the non-ident-chars branch September 16, 2021 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants