Only real block fillers should be removed on DOM to view conversion #3704

pjasiun · 2016-05-11T15:08:06Z

For the data pipeline text node with   is a block filler.

When user call set data them DOM data:

<p>&nbsp;</p>

Is converted to view:

<p></p>

What is fine.

But, unfortunately, all   text node will be threaded as block fillers and ignored so:

<p>foo<b>&nbsp;</b></p>

Will be converted to:

<p>foo<b></b></p>

What is not fine anymore.

We should remove fillers only on positions which need fillers. But that is easier said than done. During the DOM to view conversion we do not have elements types, we do not know that  is a block element and  is not. We do not even have any DTD to check the type of the element to guess if it is the filler or not.

We could introduce some hardcoded (or extensible) DTD for such cases. Alternatively, if we will meet more problems like this, we could have DTD build by plugins and introduce types of all elements converted from DOM, but then we expect plugins to define one more think.

The text was updated successfully, but these errors were encountered:

Reinmar · 2017-07-03T13:24:15Z

I've just wasted 2h debugging this in the clipboard: https://github.com/ckeditor/ckeditor5-clipboard/issues/2#issuecomment-310417731.

We'll need to improve filler recognition before 1.0.0.

Reinmar · 2017-07-03T13:30:16Z

This took me so long to figure out what's broken because this test pass:

			it.only( 'not in a Chrome\'s paste-like content', () => {
				// Like:
				// <span style="color: rgb(0, 0, 0); font-family: Times;">This is the<span>\u00a0</span></span>
				// <a href="url" style="font-family: Times; font-size: medium;">third developer preview</a>
				const domDiv = createElement( document, 'div', {}, [
					createElement( document, 'span', {}, [
						document.createTextNode( 'word' ),
						createElement( document, 'span', {}, [
							document.createTextNode( '\u00a0' )
						] )
					] ),
					createElement( document, 'a', {}, [
						document.createTextNode( 'word' )
					] )
				] );

				const viewDiv = converter.domToView( domDiv );

				expect( viewDiv.getChild( 0 ).name ).to.equal( 'span' );
				expect( viewDiv.getChild( 0 ).childCount ).to.equal( 2 );

				expect( viewDiv.getChild( 0 ).getChild( 1 ).name ).to.equal( 'span' );
				expect( viewDiv.getChild( 0 ).getChild( 1 ).childCount ).to.equal( 1 );

				expect( viewDiv.getChild( 0 ).getChild( 1 ).getChild( 0 ).data ).to.equal( '\u00a0' );
			} );

While this doesn't:

			it.only( 'does not lose whitespaces in Chrome\'s paste-like content', () => {
				const fragment = dataProcessor.toView(
					'<meta charset=\'utf-8\'>' +
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				expect( stringify( fragment ) ).to.equal(
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				// Just to be sure... stringify() uses conversion and the browser extensively,
				// so it's not entirely safe.
				expect( fragment.getChild( 0 ).getChild( 1 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 0 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 2 ).getChild( 0 ).data ).to.equal( '\u00a0' );
			} );

There's theoretically not much difference between these tests but in the first case the domToView() function doesn't hit the isBlockFiller() problem. I didn't check why so this must be checked.

Reinmar · 2017-07-03T13:35:16Z

In fb514e3 I added some tests which I created just now. One of these tests (dataprocessor/htmldataprocessor.js) is commented out. Uncomment it after fixing this issue.

bendemboski · 2019-05-10T23:39:37Z

Another case that is perhaps more severe than the one listed in this issue's description is that

<p><b>bold</b>&nbsp;<i>italic</i></p>

is converted to

<p><b>bold</b><i>italic</i></p>

losing the space between the content. Since this is the way CKEditor4 outputs its HTML, it's a data compatibility issue -- now that I'm upgrading to CKEditor5 I need to pre-process all the HTML I push into the editor to make sure I don't lose spaces between styled ranges of text.

So I guess that's a +1 from me to fix this issue 😄

Reinmar · 2019-05-14T07:20:37Z

Hmm... Thanks for reporting this one and for pointing out that it's a compatibility issue. I'm going to prioritize it higher.

Reinmar · 2019-08-06T12:34:15Z

I stumbled upon this again today and didn't recognise it's again the same problem 🤦‍♂: https://github.com/ckeditor/ckeditor5-engine/issues/1775.

@mlewand, I can see you removed this from it26, but it's a critical issue so I'm moving it back to it26.

Reinmar · 2019-08-12T13:02:04Z

We talked about the algorithm with @jodator and it's a bit tricky.

We started with such an idea:

Check if this is a   or   (depending on the case). If not, it's not a filler.
Find its closest ancestor block element and check if there is any meaningful content in it other than this nbsp/br node. E.g. in   there's no meaningful content ( isn't such a thing), so   can be removed.

Now, the problem we have with this is – how to define "meaningful content"? This is a problem that we constantly have on the view/DOM level. It's clear in the model, but here we have to make decisions without the model.

Theoretically, we could have a list of inline nodes and say that none of them is meaningful content. But that wouldn't be really stable. E.g.   – in this case we have a meaningful content in this paragraph, but that  looks like something we could ignore. We can only learn whether it represents something by running converters.

The other problem is that   might've been intentional (it's visible). By removing this space, we change the meaning of the content.

Therefore, I'd considering simplifying this algorithm to handle only things that may currently be produced by CKEditor 5. Currently, when converting view->DOM (when getting the data), block fillers will only be added directly in ContainerElements. For instance,   can currently be only generated if you intentionally put that space in the model.

The list of those ContainerElements is also much easier to predict (p, li, td, h1-h6, etc.). And if you use some custom elements as blocks, we've got [DomConverter#blockElements](https://ckeditor.com/docs/ckeditor5/latest/api/module_engine_view_domconverter-DomConverter.html#member-blockElements` already – you can extend it.

Finally, this algorithm, even if it doesn't recognise a block element, will not lead to a data loss. It just won't correctly remove a   which means that some empty block will contain a single space in the model.

So, to conclude, the algorithm that I propose:

Check if this is a   or   (depending on a case). If not, it's not a filler.
Check if this node is contained directly within a block element and this block element doesn't have any other content.

jodator · 2019-08-12T13:07:39Z

Yep, this seems reasonable 👍 - we might later think about filtering out non-meaningful data on the model level (just a loose thought) but in the DOM -> View conversion such simplistic approach seams as a good compromise.

Reinmar · 2019-08-12T13:19:17Z

Also, I've just thought about loading content from CKEditor 4. If it creates some more complex structures with fillers, we can always have a special mode for it.

jodator · 2019-08-12T15:00:14Z

WIP comment (to not loose already written text):

I've checked the tests on t/404 after implementing the algorithm proposed above.

Only a handful of tests are broken after that:

in engine: 3 tests about   handling
in PF (adjacent groups - probably the same as above)

This is with a isText() check to have different handling for   in data and   in editing pipeline.

If I remove this and always check (regardless of data/editing) for other content also a test "MutationObserver should not ignore mutation inserting element different than br on the end of the empty paragraph" is broken.

Fix: Remove only real block fillers on DOM to view conversion. Closes #404. BREAKING CHANGE: The behavior of block filler detection on DOM to view conversion was changed. Now, it specifically checks the parent of a text node to check whether it is a block. Which means that a list of block element names has to be used. If you use custom elements or use one of the HTML elements which CKEditor 5 does not recognize as a block element, see #404 and `DomConverter.blockElements` documentation.

Reinmar changed the title ~~Only fillers should be removed on set data~~ Only real block fillers should be removed on DOM to view conversion Jul 3, 2017

Reinmar assigned jodator Aug 6, 2019

Reinmar referenced this issue Aug 6, 2019

Removed superfluous nbsps. Also, see ckeditor/ckeditor5-engine#404.

0f9df4d

Reinmar closed this as completed in ckeditor/ckeditor5-engine#1779 Sep 18, 2019

mlewand transferred this issue from ckeditor/ckeditor5-engine Oct 9, 2019

mlewand added this to the iteration 27 milestone Oct 9, 2019

mlewand added status:confirmed type:bug This issue reports a buggy (incorrect) behavior. package:engine labels Oct 9, 2019

jodator mentioned this issue Oct 11, 2019

T/404: Better handling of   block filler. ckeditor/ckeditor5-engine#1779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only real block fillers should be removed on DOM to view conversion #3704

Only real block fillers should be removed on DOM to view conversion #3704

pjasiun commented May 11, 2016

Reinmar commented Jul 3, 2017

Reinmar commented Jul 3, 2017

Reinmar commented Jul 3, 2017 •

edited

Loading

bendemboski commented May 10, 2019

Reinmar commented May 14, 2019

Reinmar commented Aug 6, 2019

Reinmar commented Aug 12, 2019

jodator commented Aug 12, 2019

Reinmar commented Aug 12, 2019

jodator commented Aug 12, 2019

Only real block fillers should be removed on DOM to view conversion #3704

Only real block fillers should be removed on DOM to view conversion #3704

Comments

pjasiun commented May 11, 2016

Reinmar commented Jul 3, 2017

Reinmar commented Jul 3, 2017

Reinmar commented Jul 3, 2017 • edited Loading

bendemboski commented May 10, 2019

Reinmar commented May 14, 2019

Reinmar commented Aug 6, 2019

Reinmar commented Aug 12, 2019

jodator commented Aug 12, 2019

Reinmar commented Aug 12, 2019

jodator commented Aug 12, 2019

Reinmar commented Jul 3, 2017 •

edited

Loading