Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre tag keep format #10

Open
ricleal opened this issue Mar 31, 2019 · 5 comments
Open

pre tag keep format #10

ricleal opened this issue Mar 31, 2019 · 5 comments

Comments

@ricleal
Copy link

ricleal commented Mar 31, 2019

I just wonder if there's an option to keep the format (e.g. white spaces, tabs, etc) in the pre tag.
Thanks!

@matthiask
Copy link
Owner

Currently there isn't. Whitespace normalization is the first thing that happens, before anything else, before even parsing the HTML fragment:

html = normalize_overall_whitespace(html)

That being said, now that we have a real testsuite (feincms-cleanse didn't have good coverage) I wouldn't be against selectively normalizing whitespace as long as nothing else changes, resp. only whitespace changes without effect. This would probably mean normalizing elem.text and elem.tail of all elements except of a few where normalization would be skipped.

@ricleal
Copy link
Author

ricleal commented Apr 1, 2019

Thanks a lot for the reply. It looks like it's not implemented yet but there's a way forward. 👍

@mirukana
Copy link
Contributor

html = normalize_overall_whitespace(html)

Commenting this line doesn't seem to break any tests at first glance, but I see this function removes more kind of whitespace than normalize_whitespace_in_text_or_tail.
Would including these additional whitespace in that function be enough to not need normalize_overall_whitespace, as a first step?
Thanks for the great package by the way! This is the last issue I've had with it.

@matthiask
Copy link
Owner

I think that the line does some things which are worthwhile such as normalizing various forms of whitespace. It could do this without collapsing whitespace though -- this decision could be left to normalize_whitespace_in_text_or_tail. This in turn could only run if we aren't inside <pre> at the moment (respectively inside a set of whitespace-preserving tags). Note that HTML elements are processed from the end to the beginning and from the inside out, so you'd have to peek ahead in the backlog deque to find out whether we are inside such an element or not.

I'm having a hard time constructing a non-artificial test case which fails without normalize_overall_whitespace when copy-pasting content from various sources. This may be because modern rich text widgets for web and/or browsers' contenteditable implementations are smarter than they were 10 years ago and they generally don't produce that ugly HTML anymore. I'm still reluctant to just remove all upfront whitespace normalization though. This is a piece of badly tested code, but code with a long legacy... in fact, a part of this is in use for almost 10 years now (feincms/feincms@0186b47)

@matthiask
Copy link
Owner

Oh, reading your comment again: I think it might work to just move the functionality inside normalize_whitespace_in_text_or_tail but there may be an interaction with only_whitespace_re where some elements might not be dropped anymore -- but that's just conjecture, I haven't taken a close look at the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants