fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

caitp · 2014-07-16T01:49:05Z

ngSanitize will now permit opening braces in text content, provided they
are not followed by either an unescaped backslash, or by an ASCII letter
(u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing
spec, without taking insertion mode into account.

BREAKING CHANGE

Previously, $sanitize would "fix" invalid markup in which a space preceded
alphanumeric characters in a start-tag. Following this change, any opening
angle bracket which is not followed by either a forward slash, or by an
ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec
(http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).

Closes #8193

…dparse] error

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).

caitp · 2014-07-16T01:51:15Z

src/ngSanitize/sanitize.js

-          html = html.substring( match[0].length );
-          match[0].replace( START_TAG_REGEXP, parseStartTag );
+          // We only have a valid start-tag if there is a '>'.
+          if ( match[4] ) {


/cc @IgorMinar PTAL --- This particular block is only here to make sure that we throw if we find an apparent start-tag without a trailing >

This might not be the right thing to do --- if we don't have a trailing >, we could potentially just treat it as a text node. I'm not sure what the best thing to do in this case is.

I think it is better to treat as a text node. IMO the sanitizer should be secure but tolerant

I think that's fine.

petebacondarwin · 2014-07-16T14:00:45Z

test/ngSanitize/sanitizeSpec.js

+    it('should throw badparse if text content contains "<" followed by an ASCII letter without matching ">"', function() {
+      expect(function() {
+        htmlParser('foo <a bar', handler);
+      }).toThrowMinErr('$sanitize', 'badparse', 'The sanitizer was unable to parse the following block of html: <a bar');


I this really a bad text string? I would let it go as a text block. For instance:

In my math project I found that a<b when b=10

As far as HTML parsing is concerned, /</[a-zA-Z/ is the start of a tag, so we shouldn't "fix" this, I think

Although arguably we are not trying to "parse" html here, only sanitize text that may be inadvertently parsed by a browser later

I think that this is right. we shouldn't try to fix broken html.

petebacondarwin · 2014-07-16T14:11:13Z

Other than that LGTM

IgorMinar · 2014-07-16T20:00:22Z

test/ngSanitize/sanitizeSpec.js

+    it('should accept tag delimiters such as "<" inside real tags', function() {
+      // Assert that the < is part of the text node content, and not part of a tag name.
+      htmlParser('<p> 10 < 100 </p>', handler);
+      expect(text).toEqual(' 10 < 100 ');


shouldn't this < be encoded just to be safe?

It is encoded in the real world, however in the test, the chars handler just appends the value to a string

IgorMinar · 2014-07-16T20:05:42Z

LGTM except for the one test where < is not encoded. how is it different from the test on line 87?

caitp · 2014-07-16T20:07:55Z

We're passing a handler to htmlParser() which just appends the parsed text to a string, to assert that a certain substring was treated as text content. That's why it's not encoded here. In the case of expectHTML(), where we're using the real parser, the value is encoded because of the handler used by $sanitize

IgorMinar · 2014-07-16T20:47:46Z

I see. Thanks for the explanation. LGTM then.

petebacondarwin · 2014-07-16T20:54:00Z

I still don't think that text containing a some b<a thing should throw an exception. It should just be sanitized to some b<a thing

caitp · 2014-07-16T20:55:05Z

@petebacondarwin maybe we should see how people react. I agree that it kind of sucks

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes #8212 Closes #8193

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8212 Closes angular#8193

sylvain-hamel and others added 2 commits July 15, 2014 13:05

fix: text that looks like an html tag but is not causes [$sanitize:ba…

8781c72

…dparse] error

caitp reviewed Jul 16, 2014
View reviewed changes

caitp added cla: yes and removed cla: no labels Jul 16, 2014

Narretz added needs: breaking change labels Jul 16, 2014

Narretz added this to the 1.3.0-beta.16 milestone Jul 16, 2014

petebacondarwin reviewed Jul 16, 2014
View reviewed changes

IgorMinar reviewed Jul 16, 2014
View reviewed changes

caitp closed this in f6681d4 Jul 16, 2014

theurere mentioned this pull request Aug 26, 2014

Update angular to 1.2.23 strukturag/spreed-webrtc#98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

caitp commented Jul 16, 2014

caitp Jul 16, 2014

petebacondarwin Jul 16, 2014

IgorMinar Jul 16, 2014

petebacondarwin Jul 16, 2014

caitp Jul 16, 2014

petebacondarwin Jul 16, 2014

IgorMinar Jul 16, 2014

petebacondarwin commented Jul 16, 2014

IgorMinar Jul 16, 2014

caitp Jul 16, 2014

IgorMinar commented Jul 16, 2014

caitp commented Jul 16, 2014

IgorMinar commented Jul 16, 2014

petebacondarwin commented Jul 16, 2014

caitp commented Jul 16, 2014

fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

Conversation

caitp commented Jul 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petebacondarwin commented Jul 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IgorMinar commented Jul 16, 2014

caitp commented Jul 16, 2014

IgorMinar commented Jul 16, 2014

petebacondarwin commented Jul 16, 2014

caitp commented Jul 16, 2014