Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove (non-special) comment nodes when pasting content #15557

Merged
merged 4 commits into from
May 14, 2019

Conversation

tfrommen
Copy link
Member

As suggested by @ellatrix, this is a possible replacement for #15372.


Description

When pasting something from a Google Doc into a RichText-based block, I ended up having a stray (leading) line break.

Having chased through quite a few files and functions, I finally found that the culprit is the (internal) cleanNodeList function in @wordpress/blocks (src/api/raw-handling/utils.js).
This function inserts a line break ( i.e., ultimately, a <br> tag) after non-phrasing-content elements.

The problem with Google Docs, as with potential other sources, is that there are HTML comments, and they, too, would trigger insertion of line breaks.

Steps to Reproduce

  1. Create a Google Doc and input some content, for example, this:.
Some

Content

Here
  1. Select "Content", copy and paste into a RichText component.
  2. The console will show something like that:
Received HTML:

 <html><body>
<!--StartFragment--><meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-27e8f390-7fff-ad20-17df-378c134770be">Content</span><!--EndFragment-->
</body>
</html>
Received plain text:

 Content
Processed inline HTML:

 
<br>Content

The actual problem is the <!--StartFragment--> comment node.

How has this been tested?

Copy-and-paste content from a Google Doc into a RichText component. No leading line break. 😉

Screenshots

gdoc-gutenberg

Types of changes

Remove comment nodes so that insertion of line breaks after comment nodes is now being prevented.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.

@youknowriad
Copy link
Contributor

Hey @tfrommen Thanks for the PRs, I noticed you have some meaningful contributions to Gutenberg. Let me know if you want to be added as a collaborator to the project. That way you could avoid working on forks.

@tfrommen
Copy link
Member Author

I will follow-up with unit tests once this general approach has been OK'd. 🙂

@tfrommen tfrommen added [Feature] Paste [Package] Blocks /packages/blocks [Type] Bug An existing feature does not function as intended labels May 10, 2019
@youknowriad youknowriad requested a review from a team May 13, 2019 15:47
@aduth
Copy link
Member

aduth commented May 13, 2019

Is there a related issue for this?

When pasting something from a Google Doc into a RichText-based block, I ended up having a stray (leading) line break.

I'm curious if this is something which has changed on Google Docs as far as what markup they produce, or if there are very specific circumstance which result in the comment node being included.

In any case, we should consider either adding new or updating existing fixtures in this directory to account for variations in the markup we're able to handle:

https://github.com/WordPress/gutenberg/tree/master/test/integration/fixtures

@tfrommen
Copy link
Member Author

@aduth

Is there a related issue for this?

I didn't find any related issue. As I had a fix for this already, I also did not create an issue myself, but instead created the PR and provided as much information as I had.

I'm curious if this is something which has changed on Google Docs as far as what markup they produce, or if there are very specific circumstance which result in the comment node being included.

I don't know. However, I tested this in multiple browsers, and I always get the same structure:

<html><body>
<!--StartFragment--><meta charset="utf-8">[GDOC CONTENT/MARKUP HERE]<!--EndFragment-->
</body>
</html> 

It doesn't matter if this is a single letter, or line, or even several paragraphs.

@aduth
Copy link
Member

aduth commented May 13, 2019

Would you be able to share a public Google Docs document, and a specific fragment of text you're selecting to yield this result?

For example, I've been trying with this document:

https://docs.google.com/document/d/19xXX0fr2F0n1JE2DSYJCN8mnoEp5BH65z1iPvpGPrVc/edit

Pasting the first line of the document into this CodePen textarea (to retrieve the clipboard contents as HTML):

https://codepen.io/aduth/pen/VOKJyw

I receive:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-08bd7213-7fff-d35a-a4a3-ae5184167e96"><a href="https://drive.google.com/drive/u/0/folders/1k4bWkN088Hte1mehmPkKZHbois4Zjsar" style="text-decoration:none;"><span style="font-size:11pt;font-family:Arial;color:#1155cc;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;-webkit-text-decoration-skip:none;text-decoration-skip-ink:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">(View All Agendas)</span></a></b>

(Similar results with the Steps to Reproduce from the original comment)

@aduth
Copy link
Member

aduth commented May 13, 2019

To clarify: I think this is both a reasonable approach, and sensible that we'd omit HTML comments from sourced paste contents. My only concern at this point is being able to track down the circumstances under which the original issue can occur.

@tfrommen
Copy link
Member Author

tfrommen commented May 13, 2019

@aduth when I copy the word "Hooks" from your document and paste it, I get this:

<html><body>
<!--StartFragment--><meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-bf3f5a98-7fff-39e1-71d5-0803876c48b6">Hooks</span><!--EndFragment-->
</body>
</html>

As was to be expected.
I get this for both Firefox and Chrome.

However, I now also tested with Microsoft Edge, and all I get there is this:

<meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-e99261f1-7fff-ec19-189c-29466fb16b6a">Hooks</span>

So, maybe it is a browser and OS combination thing, I don't know. I'm using Window 10 Pro (64-bit).

Copy link
Member

@ellatrix ellatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. A small unit test would be great, perhaps also updated fixtures.

@ellatrix
Copy link
Member

I don't get the HTML comments in Mac and Chrome.

@ellatrix
Copy link
Member

I wonder why removeInvalidHTML doesn't remove the comment node. None of the schemas have comments as part of them.

@tfrommen
Copy link
Member Author

I added both some unit tests and new fixtures for this.

All is green, so, can I go ahead and merge? 😁

@youknowriad youknowriad added the Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta label May 14, 2019
@@ -1 +1 @@
<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-7102d5c2-7fff-c8d1-1082-5abceee52545"><br /><div dir="ltr" style="margin-left:0pt;"><table style="border:none;border-collapse:collapse;width:451.27559055118115pt"><colgroup><col width="*" /><col width="*" /><col width="*" /></colgroup><tr style="height:3.75pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">One</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Two</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Three</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">1</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">2</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">3</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">I</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">II</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">III</span></p></td></tr></table></div></b>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of updating this? I don't see any comments added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the leading meta tag was not right in both the existing fixtures. One file had it twice (once with single and once with double quotes), the other did not have the meta tag at all.

Comments I just added to the two new files google-docs-(table-)with-comments.

@ellatrix
Copy link
Member

It would be good to know why removeInvalidHTML doesn't remove the comments... In the meantime this seems like a good thing to merge.

@youknowriad youknowriad merged commit 13a6b26 into WordPress:master May 14, 2019
@youknowriad youknowriad added this to the 5.7 (Gutenberg) milestone May 14, 2019
@jorgefilipecosta jorgefilipecosta removed the Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta label Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Paste [Package] Blocks /packages/blocks [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants