Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accepts strings with control characters #348

Closed
jwhear opened this issue Apr 24, 2014 · 5 comments
Closed

Accepts strings with control characters #348

jwhear opened this issue Apr 24, 2014 · 5 comments
Milestone

Comments

@jwhear
Copy link

jwhear commented Apr 24, 2014

jq (1.3) parses, without complaint, strings which contain control character U+0083, whereas the JSON spec excludes control characters from strings.

@pkoppstein
Copy link
Contributor

Nothing I've read about jq suggests that it should reject invalid JSON. In fact, it would be great if it had the ability (perhaps governed by a switch) to transform imperfect JSON into JSON.

Please also note that the most recent "Proposed Standard" for JSON (http://tools.ietf.org/html/rfc7159) explicitly says:

A JSON parser MAY accept non-JSON forms or extensions.

@nicowilliams
Copy link
Contributor

Indeed. I suppose a strict mode would be nice.

@nicowilliams
Copy link
Contributor

Actually, this is dangerous, therefore I'm re-opening this.

We're considering defining a "JSON text sequence" MIME type that corresponds roughly to what jq does. Allowing unescaped newlines in strings is destructive to the ability to recover from stream corruption (discard corrupted entries), which can result when they are written in O_APPEND style (think power failures). We're also considering the use of ASCII RS as a text separator for similar purposes. Allowing these text separators (newline or RS) to appear unescaped in strings breaks the recovery algorithm.

DO NOT rely on jq's willingness to accept unescaped control characters in strings.

@nicowilliams nicowilliams added this to the 1.5 release milestone Jun 11, 2014
@nicowilliams
Copy link
Contributor

@jwhear RFC7159 says:

   The representation of strings is similar to conventions used in the C
   family of programming languages.  A string begins and ends with
   quotation marks.  All Unicode characters may be placed within the
   quotation marks, except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

U+0083 is not included in the must-be-escaped list.

@nicowilliams
Copy link
Contributor

Oh, this breaks a test in a most non-obvious way. I'll look again after sleeping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants