Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsEscaper duplicating json_encode() ? #52

Open
mindplay-dk opened this issue Aug 8, 2016 · 10 comments
Open

JsEscaper duplicating json_encode() ? #52

mindplay-dk opened this issue Aug 8, 2016 · 10 comments
Labels

Comments

@mindplay-dk
Copy link

Is JsEscaper duplicating the behavior of json_encode()?

Isn't this code essentially equivalent to, say, substr(json_encode($value), 1, -1) ?

Is there any benefit to re-implementing this?

Also, is it necessary to escape multi-byte UTF-8 characters in the first place? json_encode() lets you disable unnecessary unicode escapes with the JSON_UNESCAPED_UNICODE flag.

For non-UTF-8 (e.g. ASCII) output, unicode escaping is required - but the likely use-case is outputting UTF-8 strings, right? So why escape everything as ASCII?

@harikt harikt added the question label Aug 8, 2016
@harikt
Copy link
Member

harikt commented Aug 8, 2016

@mindplay-dk that's an interesting question which I am not sure what is the best choice.

@mindplay-dk
Copy link
Author

mindplay-dk commented Aug 8, 2016

@harikt mainly I have two reservations about aggressively escaping all unicode characters:

  1. Data overhead: any non-ASCII characters come out several times more bytes than necessary.
  2. Human overhead: the resulting output is much harder for a person to read when debugging.

As far as I can figure, the only justification for a custom implementation, is the fact that json_encode() only supports UTF-8.

However, UTF-8 is also by far the most common use-case, and the escaper actually knows the target encoding, so maybe in this case json_encode() could be used as an optimization?

Here's the (UTF-8 only) function I ended up with:

/**
 * Escape string for use in a JavaScript string literal context, e.g.:
 *
 *     <script>
 *         document.title = 'Page: <%= html(js($title)) %>';
 *     </script>
 *
 * Note the use of both `html()` and `js()` here, since there are actually two nested
 * contexts in this example: a JavaScript string in an HTML context.
 *
 * To illustrate why HTML context isn't simply assumed, here is an example using both
 * `html()` and `js()`, since the following is JavaScript in an HTML-attribute context:
 *
 *     <button onclick="alert('Hello, <?= attr(js($username)) ?>')">
 *
 * @param string|mixed $value UTF-8 string (or any string-castable value)
 *
 * @return string Javascript-string-escaped UTF-8 string
 */
function js($value)
{
    $value = (string) $value;

    if ($value === "") {
        return $value;
    }

    $str = json_encode($value, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);

    if (json_last_error() !== JSON_ERROR_NONE) {
        throw new InvalidArgumentException(json_last_error_msg());
    }

    return preg_replace("#'#u", "\\'", substr($str, 1, -1));
}

This won't escape unicode characters or slashes - but will escape the single quote, which isn't escaped by json_encode() because it quotes strings with double quotes; but we want this function to work in the context of either a single or double-quoted JS string literal.

Here's my (Codeception Cest) spec, if that's helpful:

    public function escapeJavascriptStrings(UnitTester $I)
    {
        $I->assertSame('\\\'', js('\''), "single quotes are escaped");

        $I->assertSame('\\"', js('"'), "double quotes are escaped");

        $I->assertSame('\\\\', js('\\'), "backslashes are escaped");

        $I->assertSame('æøå', js('æøå'), "UTF-8 characters are preserved");

        $I->assertSame("", js(null), "null treated as empty string");

        $I->assertSame("123", js(123), "numeric values treated as strings");
    }

@harikt
Copy link
Member

harikt commented Sep 29, 2016

@mindplay-dk your functions are quite impressive. But I am not sure if there is any edge case we are missing.

I was also looking at https://docs.zendframework.com/zend-escaper/escaping-javascript/ . May be there is some where a test suite against xss where we could try.

Thanks again for exploring your time against the same.

@pmjones
Copy link
Member

pmjones commented Feb 9, 2017

@mindplay-dk The note from @harikt regarding the Zend Escaper protections (from which the escaping in this library is derived) seems accurate to me; i.e., json_encode() by itself is insufficient.

@jakejohns If you don't mind I can close this issue, or you can.

@mindplay-dk
Copy link
Author

json_encode() by itself is insufficient.

Yes. Maybe you didn't notice the examples in the doc-block?

I regard JS and HTML escaping as being different things - the js() function deliberately does only JavaScript string-escape, I have separate html() and attr() functions to further escape the string when used in HTML or HTML-attribute.

The reason I decided to go this way, is because I might have a view for an application/javascript response body, or I might be emitting content inside a CDATA block - in those case, I'd be calling js() only, and don't want it to do anything other than escaping the string for insertion into a JS context.

If I'm inserting into another context, I make two function-calls.

This takes a little getting used to, but I personally found that this leads to a lower total number of functions, combinations of which cover just about any use-case.

For example, it composes well when I'm building initialization code:

require(["foo"], function (foo) {
    foo.init({
        path: "/app-root/<?= js($path) ?>"
    });
});

And then injecting that into an HTML page:

<script>
    <?= html($init) ?>
</script>

I mean, if you care about HTML views only, yeah, you might as well go ahead and have that function escape for HTML right away - I personally like having functions that do one thing.

I have a total of four helper-functions for views and don't anticipate needing any more: js() for JS string-context, json() for non-string values (wrapper around json_encode() with sensible defaults), html() for HTML-tags and attr() for HTML-attributes.

I mix these as needed, which isn't "safe by default" in HTML contexts, but forces me to think about contexts wrapping contexts which I find reduces the risk of blindly trusting these functions for any given scenario.

I realize that's mostly a matter of taste and personal preference :-)

@pmjones
Copy link
Member

pmjones commented Feb 9, 2017

Maybe I'm missing your point buried in the wall-of-text. Can you state the change you'd like to see in a sentence or two, or in a PR?

@mindplay-dk
Copy link
Author

Heh, no, I'm not asking you to change anything, I was just trying to explain my own approach.

The reason I even commented in the first place was simply because I was referencing your implementation, trying to learn if there was something I was missing, or why it is you need to custom-code stuff in PHP that's already available as functions.

I think we just have different goals and objectives, so you guys should probably just keep doing what you're doing :-)

@pmjones
Copy link
Member

pmjones commented Feb 9, 2017

trying to learn if there was something I was missing

Ah so! If you have links to share on your own implementations, I'm sure we'd be interested to seem them.

@mindplay-dk
Copy link
Author

The project is closed-source (for the time being) but here's a copy/paste of the view-functions:

https://gist.github.com/mindplay-dk/2e3bdeec957d6f51197b4425c5192519

It's just a tiny set of four simple functions - the documentation is almost more important than the actual implementations, which are pretty trivial, so I copy/pasted the relevant section of the documentation as well.

Like I said, I am by no means suggesting you abandon your own approach, I was merely interested in sharing an alternative. Using these functions does require a little more thinking - I happen to think that's a good thing, but completely understand if that's not everyone's cup of tea :-)

@mindplay-dk
Copy link
Author

Decided to package it real quick with tests and all. Here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants