Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only real block fillers should be removed on DOM to view conversion #3704

Closed
pjasiun opened this issue May 11, 2016 · 10 comments · Fixed by ckeditor/ckeditor5-engine#1779
Closed
Assignees
Labels
package:engine type:bug This issue reports a buggy (incorrect) behavior.

Comments

@pjasiun
Copy link

pjasiun commented May 11, 2016

For the data pipeline text node with   is a block filler.

When user call set data them DOM data:

<p>&nbsp;</p>

Is converted to view:

<p></p>

What is fine.

But, unfortunately, all &nbsp; text node will be threaded as block fillers and ignored so:

<p>foo<b>&nbsp;</b></p>

Will be converted to:

<p>foo<b></b></p>

What is not fine anymore.

We should remove fillers only on positions which need fillers. But that is easier said than done. During the DOM to view conversion we do not have elements types, we do not know that <p> is a block element and <b> is not. We do not even have any DTD to check the type of the element to guess if it is the filler or not.

We could introduce some hardcoded (or extensible) DTD for such cases. Alternatively, if we will meet more problems like this, we could have DTD build by plugins and introduce types of all elements converted from DOM, but then we expect plugins to define one more think.

@Reinmar
Copy link
Member

Reinmar commented Jul 3, 2017

I've just wasted 2h debugging this in the clipboard: https://github.com/ckeditor/ckeditor5-clipboard/issues/2#issuecomment-310417731.

We'll need to improve filler recognition before 1.0.0.

@Reinmar Reinmar changed the title Only fillers should be removed on set data Only real block fillers should be removed on DOM to view conversion Jul 3, 2017
@Reinmar
Copy link
Member

Reinmar commented Jul 3, 2017

This took me so long to figure out what's broken because this test pass:

			it.only( 'not in a Chrome\'s paste-like content', () => {
				// Like:
				// <span style="color: rgb(0, 0, 0); font-family: Times;">This is the<span>\u00a0</span></span>
				// <a href="url" style="font-family: Times; font-size: medium;">third developer preview</a>
				const domDiv = createElement( document, 'div', {}, [
					createElement( document, 'span', {}, [
						document.createTextNode( 'word' ),
						createElement( document, 'span', {}, [
							document.createTextNode( '\u00a0' )
						] )
					] ),
					createElement( document, 'a', {}, [
						document.createTextNode( 'word' )
					] )
				] );

				const viewDiv = converter.domToView( domDiv );

				expect( viewDiv.getChild( 0 ).name ).to.equal( 'span' );
				expect( viewDiv.getChild( 0 ).childCount ).to.equal( 2 );

				expect( viewDiv.getChild( 0 ).getChild( 1 ).name ).to.equal( 'span' );
				expect( viewDiv.getChild( 0 ).getChild( 1 ).childCount ).to.equal( 1 );

				expect( viewDiv.getChild( 0 ).getChild( 1 ).getChild( 0 ).data ).to.equal( '\u00a0' );
			} );

While this doesn't:

			it.only( 'does not lose whitespaces in Chrome\'s paste-like content', () => {
				const fragment = dataProcessor.toView(
					'<meta charset=\'utf-8\'>' +
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				expect( stringify( fragment ) ).to.equal(
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				// Just to be sure... stringify() uses conversion and the browser extensively,
				// so it's not entirely safe.
				expect( fragment.getChild( 0 ).getChild( 1 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 0 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 2 ).getChild( 0 ).data ).to.equal( '\u00a0' );
			} );

There's theoretically not much difference between these tests but in the first case the domToView() function doesn't hit the isBlockFiller() problem. I didn't check why so this must be checked.

@Reinmar
Copy link
Member

Reinmar commented Jul 3, 2017

In fb514e3 I added some tests which I created just now. One of these tests (dataprocessor/htmldataprocessor.js) is commented out. Uncomment it after fixing this issue.

@bendemboski
Copy link
Contributor

Another case that is perhaps more severe than the one listed in this issue's description is that

<p><b>bold</b>&nbsp;<i>italic</i></p>

is converted to

<p><b>bold</b><i>italic</i></p>

losing the space between the content. Since this is the way CKEditor4 outputs its HTML, it's a data compatibility issue -- now that I'm upgrading to CKEditor5 I need to pre-process all the HTML I push into the editor to make sure I don't lose spaces between styled ranges of text.

So I guess that's a +1 from me to fix this issue 😄

@Reinmar
Copy link
Member

Reinmar commented May 14, 2019

Hmm... Thanks for reporting this one and for pointing out that it's a compatibility issue. I'm going to prioritize it higher.

@Reinmar
Copy link
Member

Reinmar commented Aug 6, 2019

I stumbled upon this again today and didn't recognise it's again the same problem 🤦‍♂: https://github.com/ckeditor/ckeditor5-engine/issues/1775.

@mlewand, I can see you removed this from it26, but it's a critical issue so I'm moving it back to it26.

@Reinmar
Copy link
Member

Reinmar commented Aug 12, 2019

We talked about the algorithm with @jodator and it's a bit tricky.

We started with such an idea:

  1. Check if this is a &nbsp; or <br> (depending on the case). If not, it's not a filler.
  2. Find its closest ancestor block element and check if there is any meaningful content in it other than this nbsp/br node. E.g. in <p><strong>&nbsp;</strong><span></span></p> there's no meaningful content (<strong> isn't such a thing), so &nbsp; can be removed.

Now, the problem we have with this is – how to define "meaningful content"? This is a problem that we constantly have on the view/DOM level. It's clear in the model, but here we have to make decisions without the model.

Theoretically, we could have a list of inline nodes and say that none of them is meaningful content. But that wouldn't be really stable. E.g. <p>&nbsp;<span class=placeholder></span></p> – in this case we have a meaningful content in this paragraph, but that <span> looks like something we could ignore. We can only learn whether it represents something by running converters.

The other problem is that <p><span style="background: red">&nbsp;</span></p> might've been intentional (it's visible). By removing this space, we change the meaning of the content.

Therefore, I'd considering simplifying this algorithm to handle only things that may currently be produced by CKEditor 5. Currently, when converting view->DOM (when getting the data), block fillers will only be added directly in ContainerElements. For instance, <p><strong>&nbsp;</strong></p> can currently be only generated if you intentionally put that space in the model.

The list of those ContainerElements is also much easier to predict (p, li, td, h1-h6, etc.). And if you use some custom elements as blocks, we've got [DomConverter#blockElements](https://ckeditor.com/docs/ckeditor5/latest/api/module_engine_view_domconverter-DomConverter.html#member-blockElements` already – you can extend it.

Finally, this algorithm, even if it doesn't recognise a block element, will not lead to a data loss. It just won't correctly remove a &nbsp; which means that some empty block will contain a single space in the model.

So, to conclude, the algorithm that I propose:

  1. Check if this is a &nbsp; or <br> (depending on a case). If not, it's not a filler.
  2. Check if this node is contained directly within a block element and this block element doesn't have any other content.

@jodator
Copy link
Contributor

jodator commented Aug 12, 2019

Yep, this seems reasonable 👍 - we might later think about filtering out non-meaningful data on the model level (just a loose thought) but in the DOM -> View conversion such simplistic approach seams as a good compromise.

@Reinmar
Copy link
Member

Reinmar commented Aug 12, 2019

Also, I've just thought about loading content from CKEditor 4. If it creates some more complex structures with fillers, we can always have a special mode for it.

@jodator
Copy link
Contributor

jodator commented Aug 12, 2019

WIP comment (to not loose already written text):

I've checked the tests on t/404 after implementing the algorithm proposed above.

Only a handful of tests are broken after that:

  1. in engine: 3 tests about <p></p>&nbsp;<p></p> handling
  2. in PF (adjacent groups - probably the same as above)

This is with a isText() check to have different handling for &nbsp; in data and <br> in editing pipeline.

If I remove this and always check (regardless of data/editing) for other content also a test "MutationObserver should not ignore mutation inserting element different than br on the end of the empty paragraph" is broken.

Reinmar referenced this issue in ckeditor/ckeditor5-engine Sep 18, 2019
Fix: Remove only real block fillers on DOM to view conversion. Closes #404.

BREAKING CHANGE: The behavior of block filler detection on DOM to view conversion was changed. Now, it specifically checks the parent of a text node to check whether it is a block. Which means that a list of block element names has to be used. If you use custom elements or use one of the HTML elements which CKEditor 5 does not recognize as a block element, see #404 and `DomConverter.blockElements` documentation.
@mlewand mlewand transferred this issue from ckeditor/ckeditor5-engine Oct 9, 2019
@mlewand mlewand added this to the iteration 27 milestone Oct 9, 2019
@mlewand mlewand added status:confirmed type:bug This issue reports a buggy (incorrect) behavior. package:engine labels Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:engine type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants