Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

Merged
merged 1 commit into from
Nov 2, 2020
Merged

[FEATURE] Switch XML minifier from pretty-data to minify-xml #488

merged 1 commit into from
Nov 2, 2020

Conversation

kristian
Copy link
Contributor

@kristian kristian commented Aug 5, 2020

All in all pretty-data is a pretty bad (phun intended) XML minifier. It
does for instance not minify tag contents, nor it optimizes anything
else than removing comments and whitespaces between tags. Tags like:

<Tag     attributeA   =   "..."     attributeB = "..."     />

Will not get reduced in size. Switched to minify-xml (a package which
I specifically created for this patch, to do a better job at minfying)
on the other hands removes comments and whitespaces between tags, as
well as collapses whitespaces in tags and removes unused namespaces from
views. These are all pretty common use-cases for UI5 xml views, thus I
clearly think it's the better option compared to pretty-data.

Also this change removes the strange assumption that pretty-data (or now
minify-xml) would modify any tag content (like pre tags), which both
libraries clearly don't do. Fixed typo in test (autpSplitter).

Thank you for your contribution! 🙌

To get it merged faster, kindly review the checklist below:

Pull Request Checklist

@CLAassistant
Copy link

CLAassistant commented Aug 5, 2020

CLA assistant check
All committers have signed the CLA.

@coveralls
Copy link

coveralls commented Aug 5, 2020

Coverage Status

Coverage increased (+0.01%) to 92.91% when pulling 198dc2c on kristian:master into be06801 on SAP:master.

Copy link
Member

@codeworrior codeworrior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your statement about whitespace and

 tags is wrong. pretty-data does removes whitespace if it is the only text content between tags. For xhtml:pre Tags, this is still wrong.

Does minify-xml behave differently? According to your description of the removeWhitespaceBetweenTags option, I don't think so.

@kristian
Copy link
Contributor Author

kristian commented Aug 5, 2020

Ahhh I see what you mean Frank! I think you are right. Let me optimize that in minify-xml though! I‘ll let you know!

@codeworrior
Copy link
Member

I don't think that minify-xml should know our usage of xhtml in XMLViews, or should it?

@kristian
Copy link
Contributor Author

kristian commented Aug 5, 2020

No, it shouldn't... however "mixed content" (an element not only containing whitespaces and other elements) is a general concept in XML, so I am thinking about adding an option to generally ignore removing whitespaces in mixed content. I think that'd be generally favorable, what's your opinion?

Without the flag set (default option), tags like:

<Tag>Hello <Tag/>   <Tag/> World</Tag>

Would not get optimized. With the flag set it would behave the same as pretty-data and reduce it to:

<Tag>Hello <Tag/><Tag/> World</Tag>

With such an option I can only think of certain special cases where this logic might fail:

<Tag>Hello <Tag>  <Tag/>  </Tag> World</Tag>

Would still get reduced to:

<Tag>Hello <Tag><Tag/></Tag> World</Tag>

Alternative would be to completely igore all tags inside of mixed tags, which might be the right thing to do, if I think about it...

Or we stick with your pre-check... I am not sure. What do you think? I'll have to read a little bit more about XHTML, maybe I have to even add an option to ignore certain tags from minification. I am not entirely sure how <pre> is defined in the specification tbh.

@kristian
Copy link
Contributor Author

kristian commented Aug 5, 2020

But wait... if you really want to have tags rendered in a <pre> definitely < would newd to get encoded as &lt;. Otherwise it's out of spec. also inside <pre>. The only other special case I can think of is:

<pre>

So a <pre> with only whitespaces, which would not classify as mixed conent. This is again where I would need a flag to specify a list of "don't optimize these tags"... I'm unsure... Let me think about it and read how XML1.1 deals with mixed content in general.

Edit: Hell, even GitHub is confused what we are discussing here...

@codeworrior
Copy link
Member

Trying to explain encoding issues in a WYSIWYG editor is hell, yes.

To me the <pre> check seemed fairly simple, it should match only in very rare cases and skip the optimisation for them. With minify-xml, it could even just toggle the removeWhitespaceBetweenTags option.

@kristian
Copy link
Contributor Author

kristian commented Aug 5, 2020

You are right... it's such a narrow use case, that putting so much "special case handling effort" in minify-xml seems like a bit of a waste... afterall it's mainly a XML minifier, not a (X)HTML minifier, and outside of XHTML mixed content is such a rarety... so I think your approach to set the flag on documents containing pre tags is a fairly nice one!

I will think about it one last time when I am back to my PC. Otherwise let me add that and fix the bugs you reported in minify-xml! Thanks Frank!

@kristian
Copy link
Contributor Author

kristian commented Aug 6, 2020

While changing this, I also found that your pre-tag check did never work with pre-tags containing attributes. Changed:

const xmlHtmlPrePattern = /<(?:\w+:)?pre>/;

To:

const xmlHtmlPrePattern = /<(?:\w+:)?pre\b/;

I will soon amend my change. Just need to fix two more tests.

@kristian
Copy link
Contributor Author

kristian commented Aug 6, 2020

Fixed all issues you addressed also in kristian/minify-xml#1 and kristian/minify-xml#2 @codeworrior! Thanks again for the code review! 👍 With version 2.1.0 I also added an option so empty XML elements are collapsed by default (<xml></xml> to <xml/>).

@RandomByte
Copy link
Member

RandomByte commented Aug 10, 2020

Hey @kristian,

Have you already done some benchmarking? It would be interesting to see the impact on size and runtime performance.

We usually use the tool hyperfine for measuring runtime changes, like here: #390 (comment)

For example hyperfine --warmup 1 'UI5_CLI_NO_LOCAL=X /path/to/bin/ui5 build' --export-markdown ./run1.md

@RandomByte
Copy link
Member

RandomByte commented Aug 10, 2020

I did a benchmark with an application that contains 93 relevant *.xml files:

ui5-builder ref Command Mean [s] Min [s] Max [s] Relative dist/ size
master (054408a) ui5 build 3.706 ± 0.200 3.492 4.147 1.00 10.362.589 bytes
kristian:master (5bc22ab) ui5 build 4.438 ± 0.159 4.187 4.686 1.20 10.413.376 bytes

@kristian
Copy link
Contributor Author

Mhm, the diff in size doesn't really make sense to me? The two regex which where performed in pretty-data were taken over to minify-xml, thus I cannot really explain why a larger file size was the result of the optimization? The only thing minify-xml does not optimize, in contrast to pretty-data is CDATA tags. Could you attach the DIST folders here, so I can check on the size difference?

@RandomByte
Copy link
Member

RandomByte commented Aug 10, 2020

I mean, the difference is just 50 kilobytes. I find that neglectable.

I'll send you a link to the project via chat. It's not open source.

@kristian
Copy link
Contributor Author

kristian commented Aug 10, 2020

Yes, but I mean the reason for switching to a better XML minifier was to reduce the bootstrap size ;) So I would like to look into what is the root cause of the larger size. Thanks!

@kristian
Copy link
Contributor Author

@RandomByte I was finally able to do some more benchmarking, sorry for the delay! First I checked the project you sent me and try to reproduce your benchmark results. Generally I noticed two things. The change you sent me was based on quite an old version of the CLI, I don't know if you linked your original run locally, but if you didn't this could have been the cause of the increased size. Also I noticed, that (at least what I can see from your hyperfine command (if you have not altered it), that you have not specified the self-contained keyword. The xml-minification only takes place during the building of the bootstrap bundle, which is why I don't think the difference in size has had anything to do with the benchmarking of the new minify-xml component. The command I ended up using for running hyperfine was:

hyperfine --prepare 'rmdir /s /q dist || rm -rf ./dist || echo' --warmup 1 '( set UI5_CLI_NO_LOCAL=X || UI5_CLI_NO_LOCAL=X ) && node ./node_modules/@ui5/cli/bin/ui5.js build self-contained' --export-markdown ./measure.md

Executing it on the current builder release (2.1.0) and my fork, it yielded the following results:

ui5-builder ref Command Mean [s] Min [s] Max [s] Relative dist/resources/sap-ui-custom.js size
sap:master ui5 build self-contained 24.348 ± 1.877 23.337 29.610 1.00 8.559.538 bytes
kristian:master ui5 build self-contained 23.851 ± 1.872 22.780 29.092 1.00 8.508.751 bytes

So in my tests the performance stayed the same, if not sligthly better, whereas the whole boostrap size was reduced by about 50kb, which lies in the margin what I would have expected, based on the optimizations I did in minify-xml. So for my reasoning this pull requests would be mergable, in case you have no further objections or concerns. 👍

@kristian
Copy link
Contributor Author

kristian commented Sep 1, 2020

@RandomByte any update on this PR? 👍

@RandomByte
Copy link
Member

I'll try and replicate your results.

The xml-minification only takes place during the building of the bootstrap bundle

But is that true? XML minification should also take place for the Component-preload and library-preload generation (default ui5 build behavior). That's why you had to adapt some Component-preload test fixtures: https://github.com/SAP/ui5-builder/pull/488/files#diff-a6a8acbdac295660c9386bdbfd7f18e5

@RandomByte
Copy link
Member

Hm, indeed I might have messed something up in my first benchmark. I repeated it and also did another one for the self-contained build like you did. I can confirm that in both cases, the result is about 50 KB smaller.

For the Component-preload.js of this application, that's a reduction of 1,16%.

However in both cases the build took almost a second longer. For the default preload build that's 26% of the total build time.

We will discuss this in the team and might do some additional tests with other projects. I chose this project since it has an uncommonly high amount of XML views and fragments (93). The question is how more common applications with <15 XML views are affected.

self-contained Build

ui5-builder Ref Command Mean [s] Min [s] Max [s] Relative dist/ Size Relative Size
master (9e6364a) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build self-contained 24,734 ± 0,453 24,156 25,351 1,00 14.838.021 bytes 0
kristian:master (5bc22ab) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build self-contained 25,529 ± 0,660 24,868 27,183 1,03 14.787.234 bytes -50.787 bytes

Preload Build (default)

ui5-builder Ref Command Mean [s] Min [s] Max [s] Relative dist/ Size Component-preload.js Size Relative Size
master (054408a) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build 3,703 ± 0,090 3,549 3,828 1,00 10.413.376 bytes 4.442.546 bytes 0
kristian:master (5bc22ab) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build 4,675 ± 0,281 4,159 5,283 1,26 10.362.589 bytes 4.391.759 bytes  -50.787 bytes

The change you sent me was based on quite an old version of the CLI, I don't know if you linked your original run locally, but if you didn't this could have been the cause of the increased size

I directly called the CLI in my development setup where all UI5 Tooling modules are cloned and linked between each other. So basically UI5_CLI_NO_LOCAL=X /path/to/my/development/directory/ui5-cli/bin/ui5.js build. Did you link a cloned UI5 CLI or just the builder module to the old CLI?

@kristian
Copy link
Contributor Author

kristian commented Sep 3, 2020

I just linked the builder module and used the same CLI. But anyways... for me the runtime was even faster when using my new module, which does not make sense, because at least my module contains more regular expressions executed. I can only say, I also always got the warning message of hyperfine, that there was quite a high variation between runs, thus the results might not be significant enough. I can't really tell whether the variation in build time is due to my change, or whether it is due to variations in disk speed, etc.? Also, even if the performance is decreased, I cannot tell if it is a linear performance decrease, what I mean is, whether my change generally adds 1 second "flat" to the build time (e.g. the whole build takes 120s normally, and with my build it takes 121s), or if it depends on the number of resources etc. All I can say, that for me performance measurements in NodeJS always have a high number of variables involved, so a differentiated test is very hard to do.

What might be reasonable to do, instead of executing a "test build" to analyze the performance, let's try to execute the realted unit-tests only. This (for my opionion) should give a much more accurate performance measurement result.

@RandomByte
Copy link
Member

RandomByte commented Sep 3, 2020

can't really tell whether the variation in build time is due to my change, or whether it is due to variations in disk speed, etc.?

For your results this might be the case, since it reports a deviation of ± 1.877. Are you using windows? If yes, did you execute this benchmark using the native CMD or some BASH emulation like the git-scm BASH?

Anyways, my results are pretty consistent across all benchmarks and show only a small deviation of less than a second. Even the one I did almost a month ago shows a 700ms increase (given that only the dist-size and not the runtime was incorrect in that benchmark).

I was also wondering wether it's one second flat or linear with the amount of XMLs to process. This should become clear once we benchmark some more applications.

@kristian
Copy link
Contributor Author

kristian commented Sep 3, 2020

@RandomByte I rebased the commit on sap:master and did some performance optimization for XML files NOT containing CData tags (which should be the most common case). Could you try re-running your performance measurements, I feel on Windows they are simply not accurate enough to be compared here. Thanks Merlin!

@kristian
Copy link
Contributor Author

@RandomByte did you have the chance of repeating your measurements with the latest version of minify-xml? Thanks!

@kristian
Copy link
Contributor Author

FYI, I again ran the minification on our current bootstrap. With the new minify-xml:

old xml minifcation bootstrap size 4,97 MB (5.214.753 bytes)
new xml minifcation (minify-xml) bootstrap size 4,93 MB (5.176.693 bytes)

Reduction of 38,06 kB (38.060 bytes) only by the introduction of the new minifier. I think that's a very good result. 👍

@RandomByte
Copy link
Member

I did another benchmark with the same app I tested earlier. The size reduction stayed the same, but it seems your change indeed improved the runtime.

The runtime is now almost identical (0%-2% difference) and the size consistently smaller (0,5% of dist/ and 1,16% of the Component-preload.js of this project). I still find ~50 kB to be a very small win, but fair enough. We'll discuss this in the team once again next week.

Preload Build (default)

ui5-builder Ref Command Mean [s] Min [s] Max [s] Relative dist/ Size Component-preload.js Size  Relative Size
master (1bdd8cf) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build 3,387 ± 0,121 3,165 3,524 1,00 10.413.376 bytes 4.442.546 bytes 0
kristian:master (ace614f) UI5_CLI_NO_LOCAL=X /xy/ui5-cli/bin/ui5.js build 3,451 ± 0,044 3,382 3,516 1,02 10.362.587 bytes 4.391.757 bytes -50.789 bytes

@RandomByte RandomByte changed the title Switch XML minifier from pretty-data to minify-xml [FEATURE] Switch XML minifier from pretty-data to minify-xml Nov 2, 2020
Copy link
Member

@codeworrior codeworrior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kristian
Copy link
Contributor Author

kristian commented Nov 2, 2020

I still find ~50 kB to be a very small win, but fair enough. We'll discuss this in the team once again next week.

Thanks Merlin! Thanks Frank! :) 50kB I find "reasonable", but it also depends on how optimized your code already is. E.g. what happens very often in our use case is, that developers simply copy a view and forget to adapt their namespace imports. Over a couple hundrets of views this adds up! 👍 Thanks for the efforts 🥇

@RandomByte
Copy link
Member

To also share the result of our team internal discussion: We will go ahead and merge this PR.

Based on my last measurements there seems to be a (small) improvement in XML size reduction at an equal runtime. Also pretty-data might not be maintained anymore.

Let me try and resolve the CI issue and I'll merge it as a new feature. Thank you for your contribution @kristian! 👍

All in all pretty-data is a pretty bad (phun intended) XML minifier. It
does for instance not minify tag contents, nor it optimizes anything
else than removing comments and whitespaces between tags. Tags like:

<Tag     attributeA   =   "..."     attributeB = "..."     />

Will not get reduced in size. Switched to minify-xml (a package which
I specifically created for this patch, to do a better job at minfying)
on the other hands removes comments and whitespaces between tags, as
well as collapses whitespaces in tags and removes unused namespaces from
views. These are all pretty common use-cases for UI5 xml views, thus I
clearly think it's the better option compared to pretty-data.

Also this change removes the strange assumption that pretty-data (or now
minify-xml) would modify any tag content (like pre tags), which both
libraries clearly don't do. Fixed typo in test (autpSplitter).
@RandomByte RandomByte merged commit be29520 into SAP:master Nov 2, 2020
RandomByte added a commit to SAP/ui5-tooling that referenced this pull request Dec 2, 2020
PRs where this process was already used:
* SAP/ui5-builder#488
* SAP/ui5-builder#390

JIRA: CPOUI5FOUNDATION-261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants