-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for inline image Markdown - attribute fix #661
Add support for inline image Markdown - attribute fix #661
Conversation
|
||
// A map of MD characters to their HTML entity equivalent | ||
const entities = { | ||
'*': '*', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A backtick is missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've intentionally left out inline code and code blocks. These rules are applied first, converting anything that matches into HTML. Therefore, for input like:
![# fake-heading *bold* _italic_ ~strike~ `code`](https://example.com/image.png)
The result is:
<img src="<a href="https://example.com/image.png" target="_blank" rel="noreferrer noopener">https://example.com/image.png</a>" alt="# fake-heading *bold* _italic_ ~strike~ <code>code</code>" />
This approach reveals a limitation in our rule application method, but I'm unable to cover it easily in the current PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aforementioned bug has been resolved by modifying the autolink
pattern to recognize self-closing tags such as img
. The absence of backticks is due to the fact that when escapeMarkdownEntities
is executed, all backticks have already been transformed into <code>
or <pre>
Puneet, adding you as reviewer as you're assigned to the issue |
Looks like this commit is not verified caa6cf2 |
This should be on hold for #658, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment - not really a blocker but seems worth fixing now since we're in here.
__tests__/ExpensiMark-HTML-test.js
Outdated
|
||
test('Image with alt text containing markdown', () => { | ||
const testString = '![# fake-heading *bold* _italic_ ~strike~ [:-)]](https://example.com/image.png)'; | ||
const resultString = '<img src="https://example.com/image.png" alt="# fake-heading *bold* _italic_ ~strike~ [:-)]" />'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm what about if the alt is:
[cool" onerror="alert('xss')"]
or something lol - I am trying hard to make the xss happen 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea. The general HTML escaping logic, which is enabled by default, appears to cover this case. It results in something like the following:
Given the input:
![test" onerror="alert('xss')](https://example.com/image.png)
I attempted to exploit the alt attribute by prematurely closing it with a "
to insert an additional attribute.
The outcome is:
<img src="https://example.com/image.png" alt="test" onerror="alert('xss')" />
However, since the quotes are properly escaped, the entire string is treated as part of the alt
attribute, preventing any JavaScript execution.
I've added a test just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks!
name: 'image', | ||
regex: MARKDOWN_IMAGE_REGEX, | ||
replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" />`, | ||
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" data-raw-href="${g2}" data-link-variant="labeled" />` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Escaping is good for the "markdown entities" - but why display them at all? The alt tag will show a bunch of junk if the image fails to load or they hover over it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered keeping the text plain initially, but realized that it wouldn't allow us to revert to the original Markdown message from the HTML. This way, the original format is retained.
Regarding hover text, the alt
attribute doesn't display text on hover. For that, Markdown has an extended syntax (![alt](src "title text")
) that adds a title
attribute to the resulting HTML. However, we don't support this feature, similarly to our approach with links.
If an image fails to load, the fallback is something like:
But NewDot and react-native-web don't display this fallback since they render a hidden <img>
tag instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hmm. Quite weird. Perhaps we can filter it before rendering i.e. not actually modify the underlying html
field. But we can look into that in the App PR. Thanks for the explanation.
8c0b7a1
to
eafc7a1
Compare
Related to: Expensify#658 (comment) Content intended for the alt attribute in images is being incorrectly parsed from Markdown to HTML if it contains MD special characters
eafc7a1
to
6a95384
Compare
Fixed
Yes, and #658 was just merged I found a bug I've described here #661 (comment) and I'm currently trying to fix it |
Thanks @kidroca! Let us know when it's ready for review and I can give this a merge. 🙇 |
I've addressed the bug related to the This PR aimed to find a minimal solution to prevent these occurrences, but it couldn't fully achieve that, so we might consider looking for a different approach.
|
That sounds kind of edge case to me. Can you give a practical example of what a user might do and what weird behavior would result? |
I'd lean towards this unless there are any glaring major regressions or limitations. |
The latest test has an example: test('No html inside the src attribute', () => {
const testString = '![`code`](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="<code>code</code>" />';
expect(parser.replace(testString)).toBe(resultString);
}) As you can see this particular input would result in html inside the alt attribute |
Ah I see. Well, I think it's valid HTML - but definitely leaves us with semantically strange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna merge this one because I think we've already got some funky alt tags with the escape characters and creating an inline code block link doesn't seem all that common. And it's only the alt
tag that gets affected - which seems like a separate problem from allowing inline images.
test('No html inside the src attribute', () => { | ||
const testString = '![`code`](https://example.com/image.png)'; | ||
const resultString = '<img src="https://example.com/image.png" alt="<code>code</code>" />'; | ||
expect(parser.replace(testString)).toBe(resultString); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kidroca Shouldn't it be alt="<code>code</code>"
instead of alt="<code>code</code>"
?
It doesn't seem to lead to anything breaking, but preferably should be escaped as I'm not sure it's valid
This breaks Live Markdown library 😢 Not sure if <
and >
characters are supported in HTML attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the text inside the alt
attribute should be alt="`code`"
or use escaped entities. However, achieving this proved challenging without significant changes. The code
rules execute first, generating a string like [<code>code</code>](...)
, and subsequent rules then move the text into the alt
attribute.
I introduced this test to address a scenario where using the syntax ![
code](https://example.com/image.png)
incorrectly generated an <img>
tag with a src
attribute like <img src="<a href="..." />" />"
. This format, obviously, failed to load the image.
How does this impact Live Markdown functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this impact Live Markdown functionality?
Currently, Live Markdown uses pretty basic algorithm for parsing HTML that alternates between finding <
and then >
.
The output <img src="..." alt="<code>...</code>" ... />
breaks the parser as the >
from <code>
gets matched instead of the (self-)closing >
from img
tag.
According to the HTML standard, >
is not a valid character in HTML attribute:
Attributes have a name and a value. Attribute names must consist of one or more characters other than controls, U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters.
Is there a chance we could somehow escape the alt
attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
The text you're referring to in your quote is regarding the attribute name - an attribute name cannot contain the specified characters
According to the HTML standard,
>
is not a valid character in HTML attribute:Attributes have a name and a value. Attribute names must consist of one or more characters other than controls, U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters.
Attribute values on the other hand, are less restricted as long as the value is wrapped in matching quotes. There's a specific text in the shared spec clearly stating that <
and >
are supported as long as the attribute value is wrapped in quotes.
The attribute value can remain unquoted if it doesn't contain ASCII whitespace or any of " ' ` = < or >. Otherwise, it has to be quoted using either single or double quotes.
In any case I'll look into it a bit further to see if there's anything I can do to mitigate the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text you're referring to in your quote is regarding the attribute name
@kidroca oops, you're right, sorry for mistake!
In any case I'll look into it a bit further to see if there's anything I can do to mitigate the issue
I believe we could use a more complex HTML parser in Live Markdown.
cc @robertKozik who adapted ExpensiMark to Live Markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But does reverting back when editing work? When I've tested it yesterday I think typing nwm I was wrong 😃```test```
as img name, and trying to edit it leads to test
string. I may be wrong tho, as I'm not remember exactly
Yeah your alternative solution will indeed solve our issues as we wouldn't have an <
or >
inside attributes. What is ETA for merging it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be in a bit of a dead lock situation here. To advance my expensify-common
PR, I need to demonstrate its effectiveness within the main
branch of NewDot. Would the solution I proposed here be acceptable to you: Expensify/App#38952 (comment)?
Feel free to utilize this snapshot of expensify-common
in your projects:
"expensify-common": "git+ssh://[email protected]/kidroca/expensify-common.git#3ad1c3322f4ec8999483955afe6f55c95e78172a",
This snapshot reflects the changes in my PR for expensify-common
.
- Incorporating it into your PR would resolve your issue
- It would allow QA to test the MD Image feature.
- But we should note to test this in the Tests section of your PR.
- This in turn would allow approve the merge of the
expensify-common
PR - Letting you to finally switch back
expensify-common
to the Expensify's repo
I'll be ready to cover any follow-up work related to the MD image feature.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky because we're incorporating the expensify-common
version inside react-native-live-markdown
as well. So we would need to point to your branch within the new library version, which isn't my preferred approach. If possible, merging your expensify-common
PR as it is would be great. If not, we can go with a patch containing your changes.
Here's how I envision the process:
- Merge your
expensify-common
PR (if possible). - Publish a new version of
react-native-live-markdown
with both our and your fixes. - Create a joint PR for
react-native-live-markdown
andexpensify-common
in the E/App repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just test your changes in live-markdown beforehand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kidroca Apart from the one problem I've mentioned here everything looks great. I think we can proceed with the patch inside the react-native-live-markdown
so it will unblock you - does it sound good for you?
This PR is an optional followup for #658 if we'd like to patch a problem where image attribute content is being parsed to HTML as detailed in: #658 (comment)
Fixed Issues
$ Expensify/App#37246
Tests
Unit/Integration Tests Covering the Change:
ExpensiMark-HTML-test.js
andExpensiMark-Markdown-test.js
. These tests validate that MD content designated for thealt
attribute is escaped as HTML entities so it isn't converted to html tags by following rulesTests Performed to Validate Changes:
<img>
tags and vice versa.QA
![image](invalid-link)
). These should not be parsed and ought to remain displayed as the raw Markdown text, without conversion or rendering as an image.