Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve XML indentation, thus optimizing file sizes #5911

Closed
wants to merge 3 commits into from

Conversation

Philzen
Copy link

@Philzen Philzen commented Apr 8, 2020

Resolves: https://musescore.org/en/node/303489

Description

As described in the linked issue description, closing brackets in any generated XML are rendered on the same indent level as their preceding child nodes, which (a) makes them harder parse visually, (b) unneccessarily bloats file size, and (c) also contradicts the expected MusicXML default formatting (References: 1, 2, 3).

Improvement

File size comparison before and after:

XML file  master @ c458cc3 Fix-303489 Total diff (bytes) New size (%) Size Decrease (%)
All in mtest 32775316 32131202 644114 98,03 % 1,97 %
I.mscx 107054 104638 2416 97,74 % 2,26 %
I.musicxml 81716 80360 1356 98,34 % 1,66 %
I.mscz 21832 21758 74 99,66 % 0,34 %
  • NB: "I.*" is a local test file for a complete musical score of the works of Zinnschauer that i recently started to work on

Baseline: Raw XML size is decreased by ~2%, and even compressed MuseScore files are shrunk by a few bytes 😄

Open Points / Help needed

  • it would be nice if someone could help out with a regression test for this change
  • as advised, the score files under mtest are updated via 4dc1c81 to the new format utilizing a small, idempodent and carefully tested script that also helps rebasing as new commits fly into master. However, there are some issues encountered:
    • mtest/libmscore/parts/part-image-parts.mscx and mtest/libmscore/parts/part-all-insertmeasures.mscx are invalid XML, as they contain a </BarLine> closing tag that is not opened anywhere
    • mtest/musicxml/io/testMultipleNotations.xml has some more indentation jumps than expected, however i can adjust this file manually and amend the commit if needed (once everything is good to go)
    • i have a hunch that it may not be wise to reformat mtest/libmscore/compat114/ and mtest/libmscore/compat206/ (judging from their name and contents)
    • maybe there are other folder below this that should not be reformatted as well
    • there are 1449 XML files in mtest that are rewritten in 4dc1c81 - however, the whole project contains 2679 XML files (=1230 files untouched), pls advise if you think those should be reformatted as well
  • i am still waiting for tests to compile and run, which takes ages (~30m) on my machine - also my local MuseScore 3.2.4 install got overwritten. I followed the testing instructions in the mtest folder. Is there any way i can speed up this process, am i looking at the wrong tutorial? Right now i'm doing make clean and repeat all the steps there - maybe we can improve on that document to hint what is necessary and what isn't to get a proper test run (and also not overwrite an existing install)
  • current master (at c458cc3) only passes 39/40 tests on my machine

Checklist

  • I signed CLA
  • I made sure the code in the PR follows the coding rules
  • I made sure the code compiles on my machine
  • I made sure there are no unnecessary changes in the code
  • I made sure the title of the PR reflects the core meaning of the issue you are solving
  • I made sure the commit message(s) contain a description and answer the question "Why do those changes fix that particular issue?" or "Why are those changes really necessary as improvements?"
  • I made sure the commit message title starts with "fix #424242:" if there is a related issue
  • [n/a] I created the test (mtest, vtest, script test) to verify the changes I made

@Jojo-Schmitz
Copy link
Contributor

Those reductions in size come at a price:

  1. the size of the repository will grow
  2. other PRs will need a rebase, even if your script may help on that this seems not really needed
  3. those changes screw up git blame
  4. not all issues you mention above may be fixable

I therefor object to those changes to the mtest mscx, mscx, xml, musicxml and mxl files and recommend to add -b to the command line options of diff in mtest/testutils.cpp, line 164, to just ignore whitespace differences (as not being significant anyhow).

Those spaces are not illegal, so don't punish for using them.

@Philzen Philzen force-pushed the 303489-xml-indentation branch from 4dc1c81 to 350856d Compare April 8, 2020 15:09
@ecstrema
Copy link
Contributor

ecstrema commented Apr 8, 2020

Out of curiosity, what platform are you building with? I've recently been trying to build and run the mtests with msvc under Windows and had no luck so far. If you did manage to do it, I would be curious to see how.

@Philzen
Copy link
Author

Philzen commented Apr 8, 2020

recommend to add -b to the command line options of diff in mtest/testutils.cpp, line 164, to just ignore whitespace differences (as not being significant anyhow).

@Jojo-Schmitz: Done → 350856d

Actually i only did 4dc1c81 because i understood these "would need" to be adjusted, as opposed to "maybe" modifying the tests themselves - doing the first cost me a whole day, the other half of the day was spent in vain trying to get the tests to pass on my machine 😏. But for what it's worth, maybe the script may come in handy someday to convert existing, non-versioned files before they get committed into some sort of repository.

Those spaces are not illegal, so don't punish for using them.

Actually i was contemplating over writing a MuseScore Plugin for git version control of scores, that's why i looked into the files in the first place to analyze the feasibilty of doing so. As form and function are inexplicably intertwined in software and data structures (and also the standards show it as good practice), i am very happy for this to go forward into v3.5.0

@Philzen Philzen marked this pull request as ready for review April 8, 2020 17:44
@Philzen
Copy link
Author

Philzen commented Apr 8, 2020

Out of curiosity, what platform are you building with?

@Marr11317 i'm a Linux user, currently using Manjaro with a 4.19 kernel. I followed the instructions in the mtest folder to compile the debug build and the tests - so maybe there is something missing or i have the wrong dependencies installed (or something else).

@Philzen Philzen changed the title Improve XML format, optimizing file sizes Improve XML indentation, thus optimizing file sizes Apr 8, 2020
@Philzen Philzen force-pushed the 303489-xml-indentation branch from 350856d to f1c9a36 Compare April 8, 2020 18:01
@Philzen
Copy link
Author

Philzen commented Apr 8, 2020

@Jojo-Schmitz using -b still fails.

I tried locally though and found that using -w works like a charm. Amended my commit accordingly with f1c9a36

@Jojo-Schmitz
Copy link
Contributor

Jojo-Schmitz commented Apr 8, 2020

Hmm, OK

-b --ignore-space-change
     Ignore changes in the amount of white space.
-w --ignore-all-space
     Ignore all white space.

Not quite sure what the difference is and why one works here and the other does not (and thought -b to be a better match), but glad that it does work ;-)
And sorry for the time you wasted on those mtest files, but I was pretty sure to have mentioned that diff option before you started working on this.

@Philzen
Copy link
Author

Philzen commented Apr 9, 2020

And sorry for the time you wasted on those mtest files, but I was pretty sure to have mentioned that diff option before you started working on this.

No worries, i'll get over it ;) And yes you mentioned it, i was just unsure about it. I actually linked that very comment of yours in my reply above when i said

Actually i only did 4dc1c81 because i understood these "would need" to be adjusted, as opposed to "maybe"

Still have one unticked box for this PR - writing a regression test. The test can be very simple, testing a generated empty score will do, as it will be sufficient to check if the last line (which contains </museScore>) does not start with spaces.

Two questions:

  1. The template says: I created the test (mtest, vtest, script test)... where in the project are "script test"s located? (just so i can have a look at them and understand what my options are there)
  2. Could you point me to an existing test that uses the existing API to sort of "headless"ly (if you know what i mean, as in "headless browser testing") creates a project and saves it? Or maybe just explain how you would ideally approach this?

@Jojo-Schmitz
Copy link
Contributor

don't think an mtest is needed here

@Philzen Philzen force-pushed the 303489-xml-indentation branch from 7ebdd46 to e0ddd73 Compare April 13, 2020 16:10
@Philzen Philzen force-pushed the 303489-xml-indentation branch from c5ea88f to 00bbb23 Compare April 22, 2020 02:38
Philzen added 3 commits April 22, 2020 04:41
Reduce file size and increase readability by fixing node indentation of XML output to be well-formed.
This affects **any** XML generated through MuseScore 3. See https://musescore.org/en/node/303489 for
full issue description and examples.

In order for tests not to fail after this change, the test runner setup is modified to ignore
whitespace changes for diff's.
Copy link
Contributor

@shoogle shoogle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is safe to ignore whitespace in XML, therefore I oppose merging this except in tandem with all the other changes in coding style (see musescore/musescore-code-style#5).

The reason is whitespace can sometimes be significant in XML (at least how MuseScore uses it).

image

          <StaffText>
            <text>This text
spans
    multiple
lines.
</text>
            </StaffText>

Even if it is safe to ignore whitespace, doing so only delays the increase in repository size, it does not prevent it entirely. The whitespace change would happen when test files are next updated, which increases size and also masks the important changes in those files.

@Philzen
Copy link
Author

Philzen commented Apr 22, 2020

Correcting the indentation in files for a data exchange format and changing coding styles are two separate things and i'm failing to see why they would need to be done together.

Of course i agree that everybody needs to agree on how to deal with the implications of diffs not matching anymore.

I don't think it is safe to ignore whitespace in XML, therefore I oppose merging this

I could revert the whitespace ignore flag change and easily convert all the files as done before in 4dc1c81.

@shoogle
Copy link
Contributor

shoogle commented Apr 22, 2020

Correcting the indentation in files for a data exchange format and changing coding styles are two separate things

Very true.

i'm failing to see why they would need to be done together

Because they have the same negative impact on git blame, and the solution to avoid this negative impact is the same in both cases: the repository has to be rebuilt using the new indentation style right from the very first commit. That is what is happening now in musescore/musescore-code-style.

@vpereverzev
Copy link
Member

Sorry, but the harm to git-blame is much more than the intended benefit
изображение

@Philzen
Copy link
Author

Philzen commented Jan 2, 2021

@vpereverzev Isn't there going to be a big rebase for MuseScore 4 to align the code base with the new coding style?

I believe @shoogle was working on that fork that used to be musescore/musescore-code-style., rewriting the git history to work around exactly the git-blame / git history growth problem that you are understandibly concerned about.

If we could pull this change in in a similar fashion, we'd improve XML readability, make them finally conforming to the standard and reduce file sizes, all in one go :)

@Jojo-Schmitz
Copy link
Contributor

That 'big rebase' happened quite a while ago already
The harm to git blame can certainly be avoided, by adding that commit to the ones to be ignored, similar to those other formatting changes.
But where in the (MusicXML?) standard does it say anything about indenting?
That size decrease is really not a lot...

@shoogle
Copy link
Contributor

shoogle commented Jan 2, 2021

@Philzen, if you look at the C++ files on the master branch you will see that the new style is already in use.

Isn't there going to be a big rebase for MuseScore 4 to align the code base with the new coding style?

This was the plan at one point, but it seems that the in-house team changed their minds and decided to apply the coding style via an ordinary merge (i.e. not via a rebase). This means there was one big commit that changed the style rather than rewriting every commit back to the initial one with the new style as though it had been used from the very beginning.

Since history was not rewritten, this means that the repository size increased (though not massively after compression) and git blame was indeed broken. However, GitHub has an easy workaround in the form of a button to "View blame prior to this change". Click the button when the blame says the line was last changed by the style commit.

image

Also, as @Jojo-Schmitz mentioned, it is possible to ignore the style commit altogether for offline usage of git blame. There is a file called .git-blame-ignore-revs that lists whitespace-only commits. Use these commands to ignore the commits in that file:

# ignore once
git blame file1.cpp --ignore-revs-file .git-blame-ignore-revs

# ignore always
git config blame.ignoreRevsFile .git-blame-ignore-revs
git blame file1.cpp
git blame file2.cpp

I would like to use the new style for XML-based files but I'm not convinced it's worth the effort. If we did do it then I would prefer to see the test files using the new style rather than make the tests ignore whitespace for the reasons I gave before. This would mean employing the same workarounds to preserve git blame as we do for the coding style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants