[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

kristian · 2020-08-05T20:11:54Z

All in all pretty-data is a pretty bad (phun intended) XML minifier. It
does for instance not minify tag contents, nor it optimizes anything
else than removing comments and whitespaces between tags. Tags like:

<Tag     attributeA   =   "..."     attributeB = "..."     />

Will not get reduced in size. Switched to minify-xml (a package which
I specifically created for this patch, to do a better job at minfying)
on the other hands removes comments and whitespaces between tags, as
well as collapses whitespaces in tags and removes unused namespaces from
views. These are all pretty common use-cases for UI5 xml views, thus I
clearly think it's the better option compared to pretty-data.

Also this change removes the strange assumption that pretty-data (or now
minify-xml) would modify any tag content (like pre tags), which both
libraries clearly don't do. Fixed typo in test (autpSplitter).

Thank you for your contribution! 🙌

To get it merged faster, kindly review the checklist below:

Pull Request Checklist

Reviewed the Contributing Guidelines
- Especially the How to Contribute section
No merge commits
Correct commit message style

CLAassistant · 2020-08-05T20:11:59Z

All committers have signed the CLA.

coveralls · 2020-08-05T20:15:09Z

Coverage increased (+0.01%) to 92.91% when pulling 198dc2c on kristian:master into be06801 on SAP:master.

codeworrior

Your statement about whitespace and

tags is wrong. pretty-data does removes whitespace if it is the only text content between tags. For xhtml:pre Tags, this is still wrong.

Does minify-xml behave differently? According to your description of the removeWhitespaceBetweenTags option, I don't think so.

kristian · 2020-08-05T20:52:17Z

Ahhh I see what you mean Frank! I think you are right. Let me optimize that in minify-xml though! I‘ll let you know!

codeworrior · 2020-08-05T21:04:39Z

I don't think that minify-xml should know our usage of xhtml in XMLViews, or should it?

kristian · 2020-08-05T21:20:25Z

No, it shouldn't... however "mixed content" (an element not only containing whitespaces and other elements) is a general concept in XML, so I am thinking about adding an option to generally ignore removing whitespaces in mixed content. I think that'd be generally favorable, what's your opinion?

Without the flag set (default option), tags like:

<Tag>Hello <Tag/>   <Tag/> World</Tag>

Would not get optimized. With the flag set it would behave the same as pretty-data and reduce it to:

<Tag>Hello <Tag/><Tag/> World</Tag>

With such an option I can only think of certain special cases where this logic might fail:

<Tag>Hello <Tag>  <Tag/>  </Tag> World</Tag>

Would still get reduced to:

<Tag>Hello <Tag><Tag/></Tag> World</Tag>

Alternative would be to completely igore all tags inside of mixed tags, which might be the right thing to do, if I think about it...

Or we stick with your pre-check... I am not sure. What do you think? I'll have to read a little bit more about XHTML, maybe I have to even add an option to ignore certain tags from minification. I am not entirely sure how <pre> is defined in the specification tbh.

kristian · 2020-08-05T21:27:30Z

But wait... if you really want to have tags rendered in a <pre> definitely < would newd to get encoded as <. Otherwise it's out of spec. also inside <pre>. The only other special case I can think of is:

<pre>

So a <pre> with only whitespaces, which would not classify as mixed conent. This is again where I would need a flag to specify a list of "don't optimize these tags"... I'm unsure... Let me think about it and read how XML1.1 deals with mixed content in general.

Edit: Hell, even GitHub is confused what we are discussing here...

codeworrior · 2020-08-05T21:34:54Z

Trying to explain encoding issues in a WYSIWYG editor is hell, yes.

To me the <pre> check seemed fairly simple, it should match only in very rare cases and skip the optimisation for them. With minify-xml, it could even just toggle the removeWhitespaceBetweenTags option.

kristian · 2020-08-05T21:50:34Z

You are right... it's such a narrow use case, that putting so much "special case handling effort" in minify-xml seems like a bit of a waste... afterall it's mainly a XML minifier, not a (X)HTML minifier, and outside of XHTML mixed content is such a rarety... so I think your approach to set the flag on documents containing pre tags is a fairly nice one!

I will think about it one last time when I am back to my PC. Otherwise let me add that and fix the bugs you reported in minify-xml! Thanks Frank!

kristian · 2020-08-06T11:50:21Z

While changing this, I also found that your pre-tag check did never work with pre-tags containing attributes. Changed:

const xmlHtmlPrePattern = /<(?:\w+:)?pre>/;

To:

const xmlHtmlPrePattern = /<(?:\w+:)?pre\b/;

I will soon amend my change. Just need to fix two more tests.

kristian · 2020-08-06T12:04:40Z

Fixed all issues you addressed also in kristian/minify-xml#1 and kristian/minify-xml#2 @codeworrior! Thanks again for the code review! 👍 With version 2.1.0 I also added an option so empty XML elements are collapsed by default (<xml></xml> to <xml/>).

RandomByte · 2020-08-10T09:28:27Z

Hey @kristian,

Have you already done some benchmarking? It would be interesting to see the impact on size and runtime performance.

We usually use the tool hyperfine for measuring runtime changes, like here: #390 (comment)

For example hyperfine --warmup 1 'UI5_CLI_NO_LOCAL=X /path/to/bin/ui5 build' --export-markdown ./run1.md

RandomByte · 2020-08-10T10:57:47Z

I did a benchmark with an application that contains 93 relevant *.xml files:

ui5-builder ref	Command	Mean [s]	Min [s]	Max [s]	Relative	`dist/` size
master (`054408a`)	`ui5 build`	3.706 ± 0.200	3.492	4.147	1.00	10.362.589 bytes
kristian:master (`5bc22ab`)	`ui5 build`	4.438 ± 0.159	4.187	4.686	1.20	10.413.376 bytes

kristian · 2020-08-10T11:01:36Z

Mhm, the diff in size doesn't really make sense to me? The two regex which where performed in pretty-data were taken over to minify-xml, thus I cannot really explain why a larger file size was the result of the optimization? The only thing minify-xml does not optimize, in contrast to pretty-data is CDATA tags. Could you attach the DIST folders here, so I can check on the size difference?

RandomByte · 2020-08-10T11:31:40Z

I mean, the difference is just 50 kilobytes. I find that neglectable.

I'll send you a link to the project via chat. It's not open source.

kristian · 2020-08-10T11:52:18Z

Yes, but I mean the reason for switching to a better XML minifier was to reduce the bootstrap size ;) So I would like to look into what is the root cause of the larger size. Thanks!

kristian · 2020-08-20T18:05:09Z

@RandomByte I was finally able to do some more benchmarking, sorry for the delay! First I checked the project you sent me and try to reproduce your benchmark results. Generally I noticed two things. The change you sent me was based on quite an old version of the CLI, I don't know if you linked your original run locally, but if you didn't this could have been the cause of the increased size. Also I noticed, that (at least what I can see from your hyperfine command (if you have not altered it), that you have not specified the self-contained keyword. The xml-minification only takes place during the building of the bootstrap bundle, which is why I don't think the difference in size has had anything to do with the benchmarking of the new minify-xml component. The command I ended up using for running hyperfine was:

hyperfine --prepare 'rmdir /s /q dist || rm -rf ./dist || echo' --warmup 1 '( set UI5_CLI_NO_LOCAL=X || UI5_CLI_NO_LOCAL=X ) && node ./node_modules/@ui5/cli/bin/ui5.js build self-contained' --export-markdown ./measure.md

Executing it on the current builder release (2.1.0) and my fork, it yielded the following results:

ui5-builder ref	Command	Mean [s]	Min [s]	Max [s]	Relative	`dist/resources/sap-ui-custom.js` size
sap:master	`ui5 build self-contained`	24.348 ± 1.877	23.337	29.610	1.00	8.559.538 bytes
kristian:master	`ui5 build self-contained`	23.851 ± 1.872	22.780	29.092	1.00	8.508.751 bytes

So in my tests the performance stayed the same, if not sligthly better, whereas the whole boostrap size was reduced by about 50kb, which lies in the margin what I would have expected, based on the optimizations I did in minify-xml. So for my reasoning this pull requests would be mergable, in case you have no further objections or concerns. 👍

kristian · 2020-09-01T17:12:40Z

@RandomByte any update on this PR? 👍

RandomByte · 2020-09-03T08:41:19Z

I'll try and replicate your results.

The xml-minification only takes place during the building of the bootstrap bundle

But is that true? XML minification should also take place for the Component-preload and library-preload generation (default ui5 build behavior). That's why you had to adapt some Component-preload test fixtures: https://github.com/SAP/ui5-builder/pull/488/files#diff-a6a8acbdac295660c9386bdbfd7f18e5

RandomByte · 2020-09-03T09:55:52Z

Hm, indeed I might have messed something up in my first benchmark. I repeated it and also did another one for the self-contained build like you did. I can confirm that in both cases, the result is about 50 KB smaller.

For the Component-preload.js of this application, that's a reduction of 1,16%.

However in both cases the build took almost a second longer. For the default preload build that's 26% of the total build time.

We will discuss this in the team and might do some additional tests with other projects. I chose this project since it has an uncommonly high amount of XML views and fragments (93). The question is how more common applications with <15 XML views are affected.

self-contained Build

ui5-builder Ref	Command	Mean [s]	Min [s]	Max [s]	Relative	`dist/` Size	Relative Size
master (`9e6364a`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build self-contained`	24,734 ± 0,453	24,156	25,351	1,00	14.838.021 bytes	0
kristian:master (`5bc22ab`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build self-contained`	25,529 ± 0,660	24,868	27,183	1,03	14.787.234 bytes	-50.787 bytes

Preload Build (default)

ui5-builder Ref	Command	Mean [s]	Min [s]	Max [s]	Relative	`dist/` Size	`Component-preload.js` Size	Relative Size
master (`054408a`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build`	3,703 ± 0,090	3,549	3,828	1,00	10.413.376 bytes	4.442.546 bytes	0
kristian:master (`5bc22ab`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build`	4,675 ± 0,281	4,159	5,283	1,26	10.362.589 bytes	4.391.759 bytes	-50.787 bytes

The change you sent me was based on quite an old version of the CLI, I don't know if you linked your original run locally, but if you didn't this could have been the cause of the increased size

I directly called the CLI in my development setup where all UI5 Tooling modules are cloned and linked between each other. So basically UI5_CLI_NO_LOCAL=X /path/to/my/development/directory/ui5-cli/bin/ui5.js build. Did you link a cloned UI5 CLI or just the builder module to the old CLI?

kristian · 2020-09-03T10:24:28Z

I just linked the builder module and used the same CLI. But anyways... for me the runtime was even faster when using my new module, which does not make sense, because at least my module contains more regular expressions executed. I can only say, I also always got the warning message of hyperfine, that there was quite a high variation between runs, thus the results might not be significant enough. I can't really tell whether the variation in build time is due to my change, or whether it is due to variations in disk speed, etc.? Also, even if the performance is decreased, I cannot tell if it is a linear performance decrease, what I mean is, whether my change generally adds 1 second "flat" to the build time (e.g. the whole build takes 120s normally, and with my build it takes 121s), or if it depends on the number of resources etc. All I can say, that for me performance measurements in NodeJS always have a high number of variables involved, so a differentiated test is very hard to do.

What might be reasonable to do, instead of executing a "test build" to analyze the performance, let's try to execute the realted unit-tests only. This (for my opionion) should give a much more accurate performance measurement result.

RandomByte · 2020-09-03T14:40:38Z

can't really tell whether the variation in build time is due to my change, or whether it is due to variations in disk speed, etc.?

For your results this might be the case, since it reports a deviation of ± 1.877. Are you using windows? If yes, did you execute this benchmark using the native CMD or some BASH emulation like the git-scm BASH?

Anyways, my results are pretty consistent across all benchmarks and show only a small deviation of less than a second. Even the one I did almost a month ago shows a 700ms increase (given that only the dist-size and not the runtime was incorrect in that benchmark).

I was also wondering wether it's one second flat or linear with the amount of XMLs to process. This should become clear once we benchmark some more applications.

kristian · 2020-09-03T18:27:13Z

@RandomByte I rebased the commit on sap:master and did some performance optimization for XML files NOT containing CData tags (which should be the most common case). Could you try re-running your performance measurements, I feel on Windows they are simply not accurate enough to be compared here. Thanks Merlin!

kristian · 2020-09-13T15:27:17Z

@RandomByte did you have the chance of repeating your measurements with the latest version of minify-xml? Thanks!

kristian · 2020-10-23T09:18:59Z

FYI, I again ran the minification on our current bootstrap. With the new minify-xml:

old xml minifcation bootstrap size 4,97 MB (5.214.753 bytes)
new xml minifcation (minify-xml) bootstrap size 4,93 MB (5.176.693 bytes)

Reduction of 38,06 kB (38.060 bytes) only by the introduction of the new minifier. I think that's a very good result. 👍

RandomByte · 2020-10-26T15:02:00Z

I did another benchmark with the same app I tested earlier. The size reduction stayed the same, but it seems your change indeed improved the runtime.

The runtime is now almost identical (0%-2% difference) and the size consistently smaller (0,5% of dist/ and 1,16% of the Component-preload.js of this project). I still find ~50 kB to be a very small win, but fair enough. We'll discuss this in the team once again next week.

Preload Build (default)

ui5-builder Ref	Command	Mean [s]	Min [s]	Max [s]	Relative	`dist/` Size	`Component-preload.js` Size	Relative Size
master (`1bdd8cf`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build`	3,387 ± 0,121	3,165	3,524	1,00	10.413.376 bytes	4.442.546 bytes	0
kristian:master (`ace614f`)	`UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build`	3,451 ± 0,044	3,382	3,516	1,02	10.362.587 bytes	4.391.757 bytes	-50.789 bytes

codeworrior

LGTM

kristian · 2020-11-02T09:51:01Z

I still find ~50 kB to be a very small win, but fair enough. We'll discuss this in the team once again next week.

Thanks Merlin! Thanks Frank! :) 50kB I find "reasonable", but it also depends on how optimized your code already is. E.g. what happens very often in our use case is, that developers simply copy a view and forget to adapt their namespace imports. Over a couple hundrets of views this adds up! 👍 Thanks for the efforts 🥇

RandomByte · 2020-11-02T09:59:00Z

To also share the result of our team internal discussion: We will go ahead and merge this PR.

Based on my last measurements there seems to be a (small) improvement in XML size reduction at an equal runtime. Also pretty-data might not be maintained anymore.

Let me try and resolve the CI issue and I'll merge it as a new feature. Thank you for your contribution @kristian! 👍

All in all pretty-data is a pretty bad (phun intended) XML minifier. It does for instance not minify tag contents, nor it optimizes anything else than removing comments and whitespaces between tags. Tags like: <Tag attributeA = "..." attributeB = "..." /> Will not get reduced in size. Switched to minify-xml (a package which I specifically created for this patch, to do a better job at minfying) on the other hands removes comments and whitespaces between tags, as well as collapses whitespaces in tags and removes unused namespaces from views. These are all pretty common use-cases for UI5 xml views, thus I clearly think it's the better option compared to pretty-data. Also this change removes the strange assumption that pretty-data (or now minify-xml) would modify any tag content (like pre tags), which both libraries clearly don't do. Fixed typo in test (autpSplitter).

PRs where this process was already used: * SAP/ui5-builder#488 * SAP/ui5-builder#390 JIRA: CPOUI5FOUNDATION-261

codeworrior reviewed Aug 5, 2020

View reviewed changes

RandomByte changed the title ~~Switch XML minifier from pretty-data to minify-xml~~ [FEATURE] Switch XML minifier from pretty-data to minify-xml Nov 2, 2020

codeworrior reviewed Nov 2, 2020

View reviewed changes

RandomByte approved these changes Nov 2, 2020

View reviewed changes

RandomByte merged commit be29520 into SAP:master Nov 2, 2020

RandomByte added a commit to SAP/ui5-tooling that referenced this pull request Dec 2, 2020

[INTERNAL] Document the UI5 Tooling benchmarking process

1db338f

PRs where this process was already used: * SAP/ui5-builder#488 * SAP/ui5-builder#390 JIRA: CPOUI5FOUNDATION-261

RandomByte mentioned this pull request Dec 2, 2020

[INTERNAL] Document the UI5 Tooling benchmarking process SAP/ui5-tooling#466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

kristian commented Aug 5, 2020

CLAassistant commented Aug 5, 2020 •

edited

Loading

coveralls commented Aug 5, 2020 •

edited

Loading

codeworrior left a comment •

edited

Loading

kristian commented Aug 5, 2020

codeworrior commented Aug 5, 2020

kristian commented Aug 5, 2020

kristian commented Aug 5, 2020 •

edited

Loading

codeworrior commented Aug 5, 2020

kristian commented Aug 5, 2020 •

edited

Loading

kristian commented Aug 6, 2020 •

edited

Loading

kristian commented Aug 6, 2020 •

edited

Loading

RandomByte commented Aug 10, 2020 •

edited

Loading

RandomByte commented Aug 10, 2020 •

edited

Loading

kristian commented Aug 10, 2020

RandomByte commented Aug 10, 2020 •

edited

Loading

kristian commented Aug 10, 2020 •

edited

Loading

kristian commented Aug 20, 2020

kristian commented Sep 1, 2020

RandomByte commented Sep 3, 2020

RandomByte commented Sep 3, 2020

kristian commented Sep 3, 2020

RandomByte commented Sep 3, 2020 •

edited

Loading

kristian commented Sep 3, 2020 •

edited

Loading

kristian commented Sep 13, 2020

kristian commented Oct 23, 2020

RandomByte commented Oct 26, 2020

codeworrior left a comment

kristian commented Nov 2, 2020

RandomByte commented Nov 2, 2020

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

Conversation

kristian commented Aug 5, 2020

Pull Request Checklist

CLAassistant commented Aug 5, 2020 • edited Loading

coveralls commented Aug 5, 2020 • edited Loading

codeworrior left a comment • edited Loading

Choose a reason for hiding this comment

kristian commented Aug 5, 2020

codeworrior commented Aug 5, 2020

kristian commented Aug 5, 2020

kristian commented Aug 5, 2020 • edited Loading

codeworrior commented Aug 5, 2020

kristian commented Aug 5, 2020 • edited Loading

kristian commented Aug 6, 2020 • edited Loading

kristian commented Aug 6, 2020 • edited Loading

RandomByte commented Aug 10, 2020 • edited Loading

RandomByte commented Aug 10, 2020 • edited Loading

kristian commented Aug 10, 2020

RandomByte commented Aug 10, 2020 • edited Loading

kristian commented Aug 10, 2020 • edited Loading

kristian commented Aug 20, 2020

kristian commented Sep 1, 2020

RandomByte commented Sep 3, 2020

RandomByte commented Sep 3, 2020

self-contained Build

Preload Build (default)

kristian commented Sep 3, 2020

RandomByte commented Sep 3, 2020 • edited Loading

kristian commented Sep 3, 2020 • edited Loading

kristian commented Sep 13, 2020

kristian commented Oct 23, 2020

RandomByte commented Oct 26, 2020

Preload Build (default)

codeworrior left a comment

Choose a reason for hiding this comment

kristian commented Nov 2, 2020

RandomByte commented Nov 2, 2020

CLAassistant commented Aug 5, 2020 •

edited

Loading

coveralls commented Aug 5, 2020 •

edited

Loading

codeworrior left a comment •

edited

Loading

kristian commented Aug 5, 2020 •

edited

Loading

kristian commented Aug 5, 2020 •

edited

Loading

kristian commented Aug 6, 2020 •

edited

Loading

kristian commented Aug 6, 2020 •

edited

Loading

RandomByte commented Aug 10, 2020 •

edited

Loading

RandomByte commented Aug 10, 2020 •

edited

Loading

RandomByte commented Aug 10, 2020 •

edited

Loading

kristian commented Aug 10, 2020 •

edited

Loading

RandomByte commented Sep 3, 2020 •

edited

Loading

kristian commented Sep 3, 2020 •

edited

Loading