Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link uri encoding, URL-escaping should be left alone inside the destination #598

Merged
merged 9 commits into from
Mar 19, 2024
35 changes: 24 additions & 11 deletions lib/src/util.dart
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,34 @@ String normalizeLinkLabel(String label) {
}

/// Normalizes a link destination, including the process of HTML characters
/// decoding and percent encoding.
/// decoding and percent encoding.
// See the description of these examples:
// https://spec.commonmark.org/0.30/#example-501
// https://spec.commonmark.org/0.30/#example-502
String normalizeLinkDestination(String destination) {
// Decode first, because the destination might have been partly encoded.
// For example https://spec.commonmark.org/0.30/#example-502.
// With this function, `foo%20bä` will be parsed in the following steps:
// 1. foo bä
// 2. foo bä
// 3. foo%20b%C3%A4
try {
destination = Uri.decodeFull(destination);
} catch (_) {}
return Uri.encodeFull(decodeHtmlCharacters(destination));
// Split by url escaping characters
// Concatenate them with unmodified URL-escaping.
// URL-escaping should be left alone inside the destination
// Refer: https://spec.commonmark.org/0.30/#example-502.

final regex = RegExp('%[0-9A-Fa-f]{2}');
final matches = regex.allMatches(destination).toList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this work of tracking the matches and the splitIterator can be simplified with splitMapJoin. Something like:

return destination.splitMapJoin(regex,
    onMatch: (m) => m.input,
    onNonMatch: (e) {
      try {
        e = Uri.decodeFull(e);
      } catch (_) {}
      return Uri.encodeFull(decodeHtmlCharacters(e));
    },
);

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't heard splitMapJoin. Thank you for your suggestion. I refactored.

final splitIterator = destination.split(regex).map((e) {
try {
e = Uri.decodeFull(e);
} catch (_) {}
return Uri.encodeFull(decodeHtmlCharacters(e));
}).iterator;

splitIterator.moveNext();
final buffer = StringBuffer(splitIterator.current);
for (var i = 0; i < matches.length; i++) {
splitIterator.moveNext();
buffer.write(matches[i].match);
buffer.write(splitIterator.current);
}

return buffer.toString();
}

/// Normalizes a link title, including the process of HTML characters decoding
Expand Down
6 changes: 5 additions & 1 deletion test/original/inline_images.unit
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,8 @@
![Uh oh...]("onerror="alert('XSS'))

<<<
<p><img src="%22onerror=%22alert('XSS')" alt="Uh oh..." /></p>
<p><img src="%22onerror=%22alert('XSS')" alt="Uh oh..." /></p>
>>> URL-escaping should be left alone inside the destination
![](https://example/foo%2Fvar)
<<<
<p><img src="https://example/foo%2Fvar" alt="" /></p>
Loading