Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When collapsing whitespace retain things like no-break space #88

Conversation

michaelwildvarian
Copy link
Contributor

When the source code contains "special" space characters, such as no-break space (\u00a0), the current regex (\s+) replaces everything with a single space character (\u0020).

By following an approach that resembles more closely to the whitespace-collapsing implemented by browsers, this can be improved (see e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/white-space-collapse#collapsing_of_white_space).

This PR changes the doCollapseWhitespace() function to transform the input string in the following way:

  1. Sequences of tab characters are replaced by a single space.
  2. Sequences of segment breaks (here interpreted to be line breaks) are replaced by a single space.
  3. Space characters before and after a "space-like but not space" character (e.g. no-break space \u00a0 or zero-width-no-break-space \ufeff) are removed.
  4. Sequences of space characters are collapsed to a single space.

Using this method, the source string hello, \u00a0 world ! becomes hello,\u00a0world! and when being displayed in the browser will not break between hello, and world!.

The tests in builder.spec.ts have been extended to include strings with \u00a0.

@daniel-sc daniel-sc self-requested a review November 27, 2023 20:13
@daniel-sc
Copy link
Owner

Hi @michaelwildvarian thanks for this elaborate PR!!

I have some general questions/remarks (probably due to lack of knowledge on my side!):

  • What is the specific issue you encountered and with which setup? (Angular version, preserve whitespace setting, ViewEngine/Ivy)
  • AFAIK Angular does some(?) whitespace collapsing on compile and before i18n texts are extracted. If that is indeed the case (and there are no additional needs) the existing collapse whitespace option should be removed from this lib (or deprecated).
  • The removal of whitespace before/after "space like characters" is the opposite of what browsers do - what is the rationale behind that?

If you could address these points, I'd be grateful!

@michaelwildvarian
Copy link
Contributor Author

Thanks for getting back to me.

  1. When extracting text from the source that contains \u00a0 (or   in an component template), then we'd like that to be retained because it's there for a reason. E.g. if we have values with units you don't want the unit to wrap to the next line. Our setup is Angular 17. I don't think that ViewEngine/Ivy has anything to do with our problem, it is literally the function I modified that indiscriminately collapses everything that matches \s+ into a single space character. When setting collapseWhitespace=false this effect doesn't happen, with the side-effect of having now excessive whitespace in the XLIFF files. Also, what the value of the CSS property white-space-collapse is should not factor in here.
  2. The whitespace collapsing implemented by this package is indeed useful, because when I disabled collapseWhitespace the extracted text did preserve the no-break space, but then contained excessive whitespace stemming from indentations. So I'd definitely want to keep it, just make it less eager to gobble everything up that looks like space.
  3. You're right, I just tried it with Chrome. I'm a too aggressive with the purging before and after no-space-whitespace. Reading the referenced MDN page something like Hello\u0020\u0020\u00a0\u0020\u0020World should be converted to Hello\u0020\u00a0\u0020World, leaving a single space before and after the no-break space. But then, I'd consider that to be an error on behalf of the creator of such a string, as having no-break space surrounded by normal space very much defeats the purpose of the no-break space. And it renders very ugly indeed.

So, should I adapt the collapsing to this effect?

@daniel-sc
Copy link
Owner

@michaelwildvarian

  1. Ok, I understand your problem and this indeed is a but that should be fixed (with this PR)!
  2. I tried to reproduce it, and for me the default settings with Angular 16.2 do collapse whitespaces in extracted i18n texts - e.g. the following template:
<div i18n="@@WITH_SUPERFLOUS_WHITESPACES">
  some    text\u00a0after-non-break   \u00a0    after non-break with whitespace   &nbsp;     after NBSP with ws
</div>

leads to the following extraction (without any processing by this lib!):

<segment>
  <source> some text\u00a0after-non-break \u00a0 after non-break with whitespace   after NBSP with ws </source>
</segment>

Note the &nbsp; is converted by angular to \u00a0.
Is it possible you set preserverWhitespace: false in your comonent - see https://angular.io/api/core/Component#preserveWhitespaces ?

  1. Yes, I think just replacing whitespace wrt to the Browser interpretation would be best - e.g. replace(/(\n|\r| )+/g, ' ')

@michaelwildvarian
Copy link
Contributor Author

michaelwildvarian commented Nov 30, 2023

Sorry for being AWOL, was pretty swamped by work...

Regarding 2: I've created a sample project and when using $localize in the TypeScript code of the component, and there it definitely doesn't "pre-collapse" space, but does so when it appears in the HTML template of the component.

Also, I'd really want to keep the collapseWhitespace enabled, because some of our strings are wrapped and indented by prettier in the TypeScript code, and that makes for some really ugly original strings in the XLIFF that the translators will hate us for.

Regarding 3: Should we through \t in there as well? And I'd make it into a character group, i.e. replace(/[\n\r\t ]+/g, ' ') to avoid making an unnecessary capturing group.

Copy link
Owner

@daniel-sc daniel-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - thanks for the clarification and the update!

@daniel-sc daniel-sc added the bug Something isn't working label Dec 1, 2023
@daniel-sc daniel-sc merged commit 1dfba72 into daniel-sc:master Dec 1, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants