Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

issue #88: Prevent infinite looping on empty/short HTML comment #89

Merged
merged 6 commits into from
Aug 18, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 10 additions & 11 deletions src/StripTags.php
Original file line number Diff line number Diff line change
Expand Up @@ -179,19 +179,18 @@ public function filter($value)
$value = (string) $value;

// Strip HTML comments first
while (strpos($value, '<!--') !== false) {
$pos = strrpos($value, '<!--');
$start = substr($value, 0, $pos);
$value = substr($value, $pos);

// If there is no comment closing tag, strip whole text
if (! preg_match('/--\s*>/s', $value)) {
$value = '';
$open = '<!--';
$openLen = strlen($open);
$close = '-->';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but before we expect closing tag to match reg exp --\s*>.
Now, we explicitly expecting no any spaces between -- and >.

Do you think expecting some spaces there was a bug? I can't see any other test which fail because of that change, and also I haven't seen anything about these spaces in the specs.

Copy link
Contributor Author

@TotalWipeOut TotalWipeOut Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see why it was doing that either. Running some additional tests this loops now handles comments in the same way strip_tags does. So i can only think this was a hidden feature/bug?
With the string test<!---- -- > -- > --> the previous version returned test -- -- where as now it returns test

Should the previous behaviour be restored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think the current behaviour is correct and comply with the html comment spec. Also, as you said, the same way strip_tags behaves, so I think it's right. You can add this test case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool. Test added

$closeLen = strlen($close);
while (($start = strpos($value, $open)) !== false) {
$end = strpos($value, $close, $start + $openLen);

if ($end === false) {
$value = substr($value, 0, $start);
} else {
$value = preg_replace('/<(?:!(?:--[\s\S]*?--\s*)?(>))/s', '', $value);
$value = substr($value, 0, $start) . substr($value, $end + $closeLen);
}

$value = $start . $value;
}

// Initialize accumulator for filtered data
Expand Down
22 changes: 22 additions & 0 deletions test/StripTagsTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,28 @@ public function testMultiQuoteInput()
$this->assertEquals($expected, $filter->filter($input));
}

public function badCommentProvider()
{
return [
['A <!--> B', 'A '], // Should be treated as just an open

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the tests. But just curious, would tests about these scenarios make sense as well:

  • multiline comments
  • stacked comments (i know this seems odd, but still it should work A <!-- <!-- --> --> B should return A B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut would you be able to add scenarios mentioned above?

I would suggest also: A <!--> B <!--> C and I believe it should return A C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icanhazstring

case with nested comment is invalid (it is not possible to have nested comments).
I agree that we should decide somehow to proceed them, and I would go with the same as modern browser. Just checked the following example on Chrome:

A <!-- B <!-- C --> D --> E

and as the result I've got:

A D --> E

so not A E as you would expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add these extra tests

Copy link
Contributor Author

@TotalWipeOut TotalWipeOut Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress and @icanhazstring We have an issue with this test:
A <!-- B <!-- C --> D --> E -> A D --> E
After the loop for stripping comments, $value is correctly A D --> E
The following loop strips the lone > so the result is A D -- E which doesn't seem right. But this is existing behaviour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut I've checked it, and it looks like desired behaviour. We have tests to cover these scenarios, that > is removed, for example:
testFilterGt: Ensures that any greater-than symbols ‘>’ are removed from text having no tags:

/**
* Ensures that any greater-than symbols '>' are removed from text having no tags
*
* @return void
*/
public function testFilterGt()
{
$filter = $this->_filter;
$input = '2 > 1 === true ==> $object->property';
$expected = '2 1 === true == $object-property';
$this->assertEquals($expected, $filter($input));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress totally fine with me to match the filter with modern browser behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress I agree, it was designed to strip orphan > characters. But I would say that this isn't correct behaviour, considering that strip_tags() and Chrome leave that character untouched.
@icanhazstring If we do that, a lot of tests will fail and need to be re-written.

How do I proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut I would keep it for now, as changing it now will be BC Break. We should log another issue and change the behaviour in next major version. So for now - in your test - expected value should be A D -- E as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I have added the requested tests, 11 in total now 🙂

['A <!---> B', 'A '], // Should be treated as just an open
['A <!----> B', 'A B'],
['A <!-- --> B', 'A B'],
['A <!--My favorite operators are > and <!--> B', 'A B'],
];
}

/**
* @dataProvider badCommentProvider
*
* @param string $input
* @param string $expected
*/
public function testBadCommentTags($input, $expected)
{
$this->assertEquals($expected, $this->_filter->filter($input));
}

/**
* @group ZF-10256
*/
Expand Down