Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lld test failures in release/18.x branch on MacOS #81967

Open
tstellar opened this issue Feb 16, 2024 · 13 comments
Open

lld test failures in release/18.x branch on MacOS #81967

tstellar opened this issue Feb 16, 2024 · 13 comments

Comments

@tstellar
Copy link
Collaborator

There are some lld tests that are failing on the release/18.x branch:

lld :: ELF/build-id.s
lld :: ELF/partition-notes.s

It looks like something has changed the way that the build-ids are being calculated. I'm unable to bisect, because when I re-test commits that used to pass, they now fail.

Full build logs are here: https://github.com/tstellar/llvm-project/actions/runs/7923959063/job/21634702772
Attached are some of the binary files produced by the failing tests.
lld-artifacts.zip

cc @MaskRay

@llvmbot
Copy link
Member

llvmbot commented Feb 16, 2024

@llvm/issue-subscribers-lld-elf

Author: Tom Stellard (tstellar)

There are some lld tests that are failing on the release/18.x branch:

lld :: ELF/build-id.s
lld :: ELF/partition-notes.s

It looks like something has changed the way that the build-ids are being calculated. I'm unable to bisect, because when I re-test commits that used to pass, they now fail.

Full build logs are here: https://github.com/tstellar/llvm-project/actions/runs/7923959063/job/21634702772
Attached are some of the binary files produced by the failing tests.
lld-artifacts.zip

cc @MaskRay

@tstellar tstellar added this to the LLVM 18.X Release milestone Feb 21, 2024
@github-project-automation github-project-automation bot moved this to Needs Triage in LLVM Release Status Feb 21, 2024
@tstellar tstellar moved this from Needs Triage to Needs Fix in LLVM Release Status Feb 21, 2024
@MaskRay
Copy link
Member

MaskRay commented Feb 28, 2024

tl;dr I suspect that llvm/lib/Support/BLAKE3 has a different result on the macOS-13 build bot, which is puzzling.
All the other machines seem to agree on the BLAKE3 result. The two tests also pass on a macOS 11 arm64 machine.


The two tests compute a build ID and compare the output with a golden value.

+ /Users/runner/work/llvm-project/llvm-project/build/bin/FileCheck --check-prefix=SEPARATE /Users/runner/work/llvm-project/llvm-project/lld/test/ELF/build-id.s
/Users/runner/work/llvm-project/llvm-project/lld/test/ELF/build-id.s:92:18: error: SEPARATE-NEXT: expected string not found in input
# SEPARATE-NEXT: 0x002001a8 5cd067a4 2631c0fd 42029037 4b8e0938
                 ^
<stdin>:2:47: note: scanning from here
0x00200198 04000000 14000000 03000000 474e5500 ............GNU.
                                              ^
<stdin>:3:1: note: possible intended match here
0x002001a8 07d4c770 93d79b3c 9f4e6f32 f49aef73 ...p...<.No2...s
^

This failure indicates that the linker output LLD_IN_TEST=1 LLD_VERSION='LLD 1.0' ld.lld --build-id=sha1 -z separate-loadable-segments build-id.s.tmp -o build-id.s.tmp2
on the macOS-13 build bot has different .note.gnu.build-id content.
If I compare the correct output with the artifact in lld-artifacts.zip, I get:

% cmp -l a2 /tmp/c/build-id.s.tmp2  # a2 is the correct build-id.s.tmp2
 425 134   7
 426 320 324
 427 147 307
 428 244 160
 429  46 223
 430  61 327
 431 300 233
 432 375  74
 433 102 237
 434   2 116
 435 220 157
 436  67  62
 437 113 364
 438 216 232
 439  11 357
 440  70 163
 441  50 176
 442 251  17
 443 326  36
 444 363 247

Only 20 bytes (the build ID) are different. All the other bytes are equal. ld.lld --build-id=sha1 uses truncated BLAKE3 instead of SHA1, so this suggests that BLAKE3::hash<20>(arr).data() has a different result on the macOS-13 bot.

On a macOS 11 arm64 machine, the BLAKE3 result matches the value observed on Linux machines.

% sw_vers
ProductName:    macOS
ProductVersion: 11.0.1
BuildVersion:   20B29
% ls ./lib/Support/BLAKE3/CMakeFiles/LLVMSupportBlake3.dir/blake3_neon.c.o  # present

@MaskRay
Copy link
Member

MaskRay commented Feb 28, 2024

@tstellar CFLAGS/CXXFLAGS -DBLAKE3_USE_NEON=0 should disable the neon implementation of BLAKE3. Does it fix the two lld --build-id tests?

If -DBLAKE3_USE_NEON=0 does make the tests pass, the possibilities will be:

  • Some neon instructions are broken on the macOS 13 build bot
  • llvm/lib/Support/BLAKE3/blake3_neon.c contains a bug.

Cc @akyrtzi who added BLAKE3.

@tstellar
Copy link
Collaborator Author

tstellar commented Feb 29, 2024

@MaskRay The macos-13 builders are x86, will that option have any effect there?

@MaskRay
Copy link
Member

MaskRay commented Feb 29, 2024

@MaskRay The macos-13 builders are x86, will that option have any effect there?

OK. Then perhaps check which SIMD implementation the macos-13 builder uses and macros BLAKE3_NO_SSE41/BLAKE3_NO_AVX2/BLAKE3_NO_AVX512 can be useful.

@tstellar
Copy link
Collaborator Author

I ran a test on every version of macOS available:
macos-11 (x86), macos-12 (x86), and macos-14(M1) all pass and only macos-13 (x86) fails.

I did another test with -DLLVM_DISABLE_ASSEMBLY_FILES=ON and this fixes the tests on macos-13, but it causes the Linux build to fail.

It seems like it could be a bug in the assembler on macos-13? I can try to get the object files to compare between the different version of macOS.

@tstellar
Copy link
Collaborator Author

Here are the blake object files from macOS-12 (Good) and macOS-13 (Bad).

macOS-12.zip
macOS-13.zip

@tstellar
Copy link
Collaborator Author

tstellar commented Mar 1, 2024

The macOS-12 and macOS-13 runners both use the exact same CPU, and the blake3 feature detection reports AVX2 for both.

@tstellar
Copy link
Collaborator Author

tstellar commented Mar 1, 2024

The default compiler on macOS-12 is Apple Clang 14.0.0. so I tried using that compiler on macOS-13 and the tests pass. So there is some change between Apple Clang 14.0.0 and Apple Clang 15.0.0 compiler that caused this test to fail.

@MaskRay
Copy link
Member

MaskRay commented Mar 1, 2024

It seems like it could be a bug in the assembler on macos-13? I can try to get the object files to compare between the different version of macOS.

Nice finding!

@akyrtzi @jroelofs ^

@MaskRay
Copy link
Member

MaskRay commented Mar 5, 2024

It seems like it could be a bug in the assembler on macos-13? I can try to get the object files to compare between the different version of macOS.

Nice finding!

@akyrtzi @jroelofs ^

Bump. Since release/18.x is approaching and the Apple Clang bug seems serious (BLAKE3 correctness), some actions need to taken. The easiest is probably to force LLVM_DISABLE_ASSEMBLY_FILES=ON for certain Apple Clang versions.

@akyrtzi
Copy link
Contributor

akyrtzi commented Mar 5, 2024

The easiest is probably to force LLVM_DISABLE_ASSEMBLY_FILES=ON for certain Apple Clang versions.

This seems like a good idea to me 👍

dyung added a commit to llvm/llvm-zorg that referenced this issue Mar 5, 2024
…ang (#129)

Two build-id related LLD tests are failing on the MacOS builder I am
trying to bring up apparently due to a bug in the version of Apple clang
installed on the machine. (See llvm/llvm-project#83940 and
llvm/llvm-project#81967).

The suggested workaround of adding `-DLLVM_DISABLE_ASSEMBLY_FILES=ON`
seems to work on the bot, so add that to the configuration.
@dyung
Copy link
Collaborator

dyung commented Mar 8, 2024

I noticed that Apple pushed out an update to the command line tools today which updated the clang version from clang-1500.1.0.2.5 to clang-1500.3.9.4 and it seems to have fixed the problem as the two tests mentioned here are now passing on my buildbot.

tstellar added a commit to tstellar/llvm-project that referenced this issue Mar 11, 2024
tstellar added a commit to tstellar/llvm-project that referenced this issue Mar 11, 2024
tstellar added a commit to tstellar/llvm-project that referenced this issue Mar 11, 2024
tstellar added a commit to tstellar/llvm-project that referenced this issue Mar 13, 2024
tstellar added a commit to tstellar/llvm-project that referenced this issue Mar 13, 2024
tstellar added a commit that referenced this issue Mar 15, 2024
bryanpkc added a commit to Huawei-CPLLab/classic-flang-llvm-project that referenced this issue Nov 26, 2024
bryanpkc added a commit to flang-compiler/classic-flang-llvm-project that referenced this issue Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Fix
Development

No branches or pull requests

6 participants