-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When collapsing whitespace retain things like no-break space #88
When collapsing whitespace retain things like no-break space #88
Conversation
Hi @michaelwildvarian thanks for this elaborate PR!! I have some general questions/remarks (probably due to lack of knowledge on my side!):
If you could address these points, I'd be grateful! |
Thanks for getting back to me.
So, should I adapt the collapsing to this effect? |
<div i18n="@@WITH_SUPERFLOUS_WHITESPACES">
some text\u00a0after-non-break \u00a0 after non-break with whitespace after NBSP with ws
</div> leads to the following extraction (without any processing by this lib!): <segment>
<source> some text\u00a0after-non-break \u00a0 after non-break with whitespace after NBSP with ws </source>
</segment> Note the
|
Sorry for being AWOL, was pretty swamped by work... Regarding 2: I've created a sample project and when using Also, I'd really want to keep the Regarding 3: Should we through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect - thanks for the clarification and the update!
When the source code contains "special" space characters, such as no-break space (
\u00a0
), the current regex (\s+
) replaces everything with a single space character (\u0020
).By following an approach that resembles more closely to the whitespace-collapsing implemented by browsers, this can be improved (see e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/white-space-collapse#collapsing_of_white_space).
This PR changes the
doCollapseWhitespace()
function to transform the input string in the following way:\u00a0
or zero-width-no-break-space\ufeff
) are removed.Using this method, the source string
hello, \u00a0 world !
becomeshello,\u00a0world!
and when being displayed in the browser will not break betweenhello,
andworld!
.The tests in
builder.spec.ts
have been extended to include strings with\u00a0
.