Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test regression in 0.18.0: repeated-updates.sh, replace-add-needed.sh #515

Open
matoro opened this issue Aug 22, 2023 · 5 comments
Open
Labels

Comments

@matoro
Copy link
Contributor

matoro commented Aug 22, 2023

Between 0.17.2 and 0.18.0, we have one test regression and one new test that does not work on sparc64. I bisected this to #475 and confirmed that applying a revert on top of 0.18.0 successfully passes all tests - both the existing replace-add-needed.sh and the new repeated-updates.sh.

As part of investigating this, I looked into #492 and #510. Applying the latter PR causes replace-add-needed.sh to pass, but repeated-updates.sh to get stuck in some sort of I/O loop, pegging the disk.

I now have a proper onboarding flow for my exotic architecture testing environment. Details are available here - if you want access, just mention and I can provide free shell access on real hardware. @Mic92 still has access, but username has been changed to g-mic92.

Downstream Gentoo bug

@matoro matoro added the bug label Aug 22, 2023
@chewi
Copy link
Contributor

chewi commented Aug 23, 2023

I should probably double check it tomorrow when I'm not half asleep, but I wondered whether reverting #475 would also help fix the tests for m68k, and it does! In that case, it was just replace-add-needed.sh that was failing though.

@matoro matoro changed the title Test regression on sparc64 in 0.18.0: repeated-updates.sh, replace-add-needed.sh Test regression in 0.18.0: repeated-updates.sh, replace-add-needed.sh Aug 23, 2023
@yuta-hayama
Copy link
Contributor

As part of investigating this, I looked into #492 and #510. Applying the latter PR causes replace-add-needed.sh to pass, but repeated-updates.sh to get stuck in some sort of I/O loop, pegging the disk.

Certainly, applying #510 and running repeated-updates.sh, the script should appear to be stuck with a massive amount of disk I/O. But this is not completely stuck in an infinite loop. If you have enough disk space and wait a dozen minutes or so, the test should PASS.

Please let me know the result of the following command. How many is the p_align of the LOAD segment?

readelf -l tests/scratch/repeated-updates/libbar.so

As noted in the rewriteSectionsLibrary() comment, patchelf places the updated section at "the end of the file".
https://github.com/NixOS/patchelf/blob/0.18.0/src/patchelf.cc#L797

Where "the end of the file" means a position after any section that existed before patchelf was applied. In repeated-updates.sh, only the .dynstr section is rewritten, but even though the rewriting deletes the old .dynstr section, the new .dynstr section is placed after the old .dynstr section. As a result, each time patchelf is applied, a "gap" appears where no section is placed, increasing the output file size.

If #510 is not applied, patchelf aligns the in-file offset of the new section (and the new LOAD segment) with pagesize (0x1000). Thus, if the updated section is short, each time patchelf is run, the size of the output file will increased by 0x1000.

However, as explained in #510, glibc before 2.35 has a bug that requires aligning in-file offset with p_align to work properly. When #510 is applied, each time patchelf is run, the size of the output file is increased by p_align.

Since repeated-updates.sh executes patchelf about 200 times, for example, if the p_align of the LOAD segment is 0x200000, the size of libbar.so when the test is completed will be about 400 MB. This is the reason for the huge amount of disk I/O when repeated-updates.sh is run.

To prevent the file from becoming huge, it may be possible to take measures such as placing the new section where the old (overwritten) section was when the section is rewritten. But to do so, we will have to overturn the premise of rewriteSectionsLibrary() (... For dynamic libraries, we just place the replacement sections at the end of the file.) This is expected to be a complex task and I have not started it yet.

@matoro
Copy link
Contributor Author

matoro commented Aug 24, 2023

Since this seems to affect multiple architectures, I ran this on some of my test machines.

Affected: m68k (per Chewi above), alpha, sparc64, sparc 32-bit

Unaffected: arm64, arm 32-bit, ppc64le, ppc64, ppc 32-bit, ia64, mips64, mips 32-bit, riscv64

@chewi
Copy link
Contributor

chewi commented Aug 24, 2023

In m68k's case, you have to align to 16-bit, although we are strongly considering building the whole system with -malign-int in future to align to 32-bit instead, which fixes various things. I thought I'd tried patchelf with this already, but I'm possibly misremembering.

@paulgevers
Copy link

Debian has applied the following very simple patch to solve/work around the issue on mips64el:

--- a/tests/repeated-updates.sh
+++ b/tests/repeated-updates.sh
@@ -33,7 +33,7 @@
 # To be even more strict, check that we don't add too many extra LOAD entries
 ###############################################################################
 echo "Segments before: ${load_segments_before} and after: ${load_segments_after}"
-if [ "${load_segments_after}" -gt $((load_segments_before + 2)) ]
+if [ "${load_segments_after}" -gt $((load_segments_before + 3)) ]
 then
     exit 1
 fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants