Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update coredistools #370

Merged
merged 41 commits into from
Jan 3, 2024
Merged

Conversation

BruceForstall
Copy link
Member

@BruceForstall BruceForstall commented Apr 18, 2023

  1. Update from LLVM 13.0.1 to 17.0.6
  2. Change Linux/Mac build to build llvm-tblgen from source instead of downloading a pre-built version. LLVM doesn't always publish all architecture versions of this tool.
  3. Change Linux to build with CBL-Mariner container.

TO DO:

  • Mac: properly download built llvm-tblgen artifact and put it on the path so the build-coredistools step can find it.
  • Linux: fix the CBL-Mariner build. Mariner currently has clang12 and is expected to update to clang16 soon. Will that fix problems? Or, back off to ubuntu again, and stop doing cross-compiler builds for linux-x64 (like we do for Mariner).

Fixes #372

@dotnet dotnet deleted a comment from azure-pipelines bot Apr 20, 2023
@BruceForstall
Copy link
Member Author

Looks like the linux built llvm-tblgen on ubuntu-20.04 doesn't correctly run on ubuntu-20.04 for linux-x64 coredistools build (crashes? hangs?).

It also fails to run at all, due to libraries dependencies, on the Linux arm/arm64 cross-build Docker containers.

Probably would be best to abandon trying to use ubuntu at all, and try to get Mariner to work. Maybe after Mariner is updated to LLVM 16.

@BruceForstall
Copy link
Member Author

I'm currently getting this when building locally (under Mariner linux-x64 container):

I have no name!@3b1d8f6db879:~/gh/jitutils$ ./build-tblgen.sh linux-x64 /crossrootfs/x64
~/gh/jitutils/obj ~/gh/jitutils
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- The ASM compiler identification is Clang
-- Found assembler: /usr/local/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /usr/bin/python3.9 (found suitable version "3.9.14", minimum required is "3.6") found components: Interpreter
-- Performing Test LLVM_LIBSTDCXX_MIN
-- Performing Test LLVM_LIBSTDCXX_MIN - Failed
CMake Error at cmake/modules/CheckCompilerVersion.cmake:88 (message):
  libstdc++ version must be at least 7.1.
Call Stack (most recent call first):
  cmake/config-ix.cmake:15 (include)
  CMakeLists.txt:848 (include)

@BruceForstall
Copy link
Member Author

Looks like we still fail to build LLVM tblgen for LLVM 16.0.6 due to the same libstdc++ version issue when building under CBL-Mariner (mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-amd64):

./build-tblgen.sh linux-x64 /crossrootfs/x64
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /__w/_temp/d033f5ce-f10f-4de7-a156-dad2298a6c2b.sh
/__w/1/s/obj /__w/1/s
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- The ASM compiler identification is Clang
-- Found assembler: /usr/local/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /usr/bin/python3.9 (found suitable version "3.9.14", minimum required is "3.6") found components: Interpreter 
-- Performing Test LLVM_LIBSTDCXX_MIN
-- Performing Test LLVM_LIBSTDCXX_MIN - Failed
-- Configuring incomplete, errors occurred!
See also "/__w/1/s/obj/CMakeFiles/CMakeOutput.log".
See also "/__w/1/s/obj/CMakeFiles/CMakeError.log".
CMake Error at cmake/modules/CheckCompilerVersion.cmake:88 (message):
  libstdc++ version must be at least 7.1.
Call Stack (most recent call first):
  cmake/config-ix.cmake:15 (include)
  CMakeLists.txt:848 (include)

Looks like /crossrootfs/x64 has libstdc++.so.6?

I have no name! [ /opt/code ]$ ls -l -aF /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc*
lrwxrwxrwx 1 root root      19 Oct  4  2019 /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc++.so.6 -> libstdc++.so.6.0.21
-rw-r--r-- 1 root root 1566440 Oct  4  2019 /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21

@sbomer Does this make sense? Are the Mariner images going to get an updated libstdc++ version 7 sometime? Any other suggestion?

cc @jakobbotsch

@sbomer
Copy link
Member

sbomer commented Sep 7, 2023

Those images have an old ubuntu rootfs that we target in our official builds for broad compatibility with a large range of glibc versions, and they don't have libstdc++ 7 installed or available in the repos.

@directhex ran into this too, and created images specifically to solve this problem, which have a newer rootfs (Ubuntu 18.04 instead of 16.04): dotnet/dotnet-buildtools-prereqs-docker#857. I would try using those images instead (for example, mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-ubuntu-18.04-amd64).

cc @directhex in case you have any advice.

@directhex
Copy link

directhex commented Sep 7, 2023

For .NET 9, you can use a newer crossroot as @sbomer suggests. We no longer target Ubuntu 16.04 for .NET 9, but AFAIK our build images haven't been updated to reflect it yet.

For .NET 8, you need to bring your own better c++ library. LLVM comes with one, you can either build it yourself or consume it via an LLVM nuget, then bundle it & fix up the rpath value in your libs/executables so they still work. See _LibCxxBootstrap target on https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L147 for building libc++, and https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L127 and https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L54 for consuming it.

@BruceForstall
Copy link
Member Author

I don't think I need to worry about .NET 8, so I'll try the new images. Thanks!

@BruceForstall
Copy link
Member Author

Well, the new containers allowed the build to succeed, but now there is some missing or mis-versioned dependency:

./bin/llvm-tblgen: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
I have no name! [ /opt/code ]$ find /usr -iname libtinfo.so\*
/usr/lib/libtinfo.so.6
/usr/lib/libtinfo.so.6.4

@sbomer
Copy link
Member

sbomer commented Sep 8, 2023

Ah, it looks like the 18.04 rootfs had libtinfo.so.5, but the mariner host on which tblgen is running has libtinfo.so.6. It looks like you can get libtinfo.so.5 with tdnf install -y ncurses-compat.

@BruceForstall
Copy link
Member Author

It looks like you can get libtinfo.so.5 with tdnf install -y ncurses-compat.

Maybe that worked? Now I get:

./bin/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by ./bin/llvm-tblgen)

but it does seem to run.

@BruceForstall
Copy link
Member Author

I tried to add a step to install libtinfo.so.5 to the docker container before llvm-tblgen is run, but got a permissions problem:

tdnf install -y ncurses-compat
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /__w/_temp/a6a50520-a2ed-48fc-8558-958cad13608d.sh
Error(1601) : Operation not permitted. You have to be root.

Maybe need to add it to the docker container construction for these? Or maybe there's some other permissions magic that will make it work?

@sbomer
Copy link
Member

sbomer commented Sep 8, 2023

Try sudo? I think Azure Pipelines grants the user that runs the job passwordless sudo permissions:

Try to create a user with UID '1001' inside the container.
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d bash -c "getent passwd 1001 | cut -d: -f1 "
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d groupadd -g 127 docker_azpcontainer
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d useradd -m -g 127 -u 1001 vsts_azpcontainer
Grant user 'vsts_azpcontainer' SUDO privilege and allow it run any command without authentication.
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d groupadd azure_pipelines_sudo
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d usermod -a -G azure_pipelines_sudo vsts_azpcontainer
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d su -c "echo '%azure_pipelines_sudo ALL=(ALL:ALL) NOPASSWD:ALL' >> /etc/sudoers"

@BruceForstall
Copy link
Member Author

Try sudo? I think Azure Pipelines grants the user that runs the job passwordless sudo permissions:

Will try.

Otherwise, looks like adding ncurses-compat to the tdnf install -y line in

dotnet/dotnet-buildtools-prereqs-docker : src\cbl-mariner\2.0\crossdeps-builder\Dockerfile

is the place to add it?

@sbomer
Copy link
Member

sbomer commented Sep 8, 2023

https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/fc0853cbd2ab042fcaa762b90759092336a2b9f3/src/cbl-mariner/2.0/crossdeps/Dockerfile would be the place to add it. I was a little hesitant to recommend this since we try to keep the build images pretty minimal, but I see we already added pip3 and zlib - so it probably doesn't hurt.

@directhex
Copy link

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

@BruceForstall
Copy link
Member Author

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

Maybe? I don't understand the Linux versioning rules. Note that dotnet/runtime#86194 moved all (or, at least most) CI VMs to 22.04. So maybe update to 22.04?

@BruceForstall
Copy link
Member Author

Installing ncurses-compat using sudo worked.

However, now the Linux x64 and arm64 builds (build-coredistools.sh step) are failing with AzDO failures:

##[error]The hosted runner encountered an error while running your job. (Error Type: Disconnect).
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

at almost exactly 29 minutes. But the Linux arm one succeeds after just 6 minutes.

Don't know what to do about it: there's no further info.

@sbomer sbomer closed this Sep 8, 2023
@sbomer sbomer reopened this Sep 8, 2023
@sbomer
Copy link
Member

sbomer commented Sep 8, 2023

Not sure about the latest failure - let's retry.

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

Good point, I just checked with @richlander and that's right. edit: see dotnet/runtime#91826 for details.

Note that dotnet/runtime#86194 moved all (or, at least most) CI VMs to 22.04. So maybe update to 22.04?

We build against a lower version (for broad glibc compat) than the version we run on in ci. So we would update the mariner build images to have a 20.04 rootfs, which will have libtinfo.so.6.

@BruceForstall
Copy link
Member Author

Same result: builds cancelled at 27m 32s.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=400328&view=results

@sbomer
Copy link
Member

sbomer commented Sep 8, 2023

Looks like there were some similar failures on OSX in the past: dotnet/runtime#34647. Could the build be filling up the disk?

@BruceForstall
Copy link
Member Author

Maybe? These aren't OSX machines, though. I could add a "df -H" job to see the "before" state, but that wouldn't actually help if the cmake/llvm build itself is going crazy. I'll try again locally (on WSL2) but It's worked for me recently.

1. LLVM doesn't release a full set of binary drops, so we need to
build llvm-tblgen on some platforms, namely Mac.
2. Build using standard CBL-Mariner Docker containers (used by
dotnet/runtime as well). This also converts linux-x64 builds to be
container-based cross builds.
Also, fix one compilation bug to build coredistools.cpp
with LLVM 16.0.1.
To use this in dotnet/runtime, after the package is published,
`MicrosoftNETCoreCoreDisToolsVersion` in eng/Versions.props needs
to be updated.
Output more diagnostics on Linux build-coredistools build

Update documentation for building Linux
The matrix appends the architecture to the displayName automatically.

Remove some debugging output.
@BruceForstall
Copy link
Member Author

When coredistools is updated, dotnet/runtime#91668 should be reverted.

@BruceForstall
Copy link
Member Author

Current status:

##[error]The hosted runner encountered an error while running your job. (Error Type: Disconnect).
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

When this happens, there is no log file output. (It says Nothing to show. Final logs are missing. This can happen when the job is cancelled or times out.). The job specifies 60 minute timeout.

If you watch the job, you can capture the in-progress build. linux-x64 seems to hang here:

...
[ 10%] Building X86GenFoldTables.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterBank.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenPreLegalizeGICombiner.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSystemOperands.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Built target LLVMSupportBlake3

and linux-arm64 hangs here:

...
[  5%] Building X86GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[  5%] Building X86GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterBank.inc...
[ 10%] Building AArch64GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building X86GenFoldTables.inc...
[ 10%] Building AArch64GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSystemOperands.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)

@BruceForstall BruceForstall changed the title [WIP] Update coredistools Update coredistools Dec 22, 2023
@BruceForstall BruceForstall marked this pull request as ready for review December 22, 2023 06:45
@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib This is ready to be reviewed and merged.
cc @sbomer

The CI system is still failing to build the linux-arm64 and linux-x64 versions. It seems to hang at some point. I can build them fine, using the exact same CBL-Mariner docker containers and scripts, in my WSL2 Ubuntu 22.04.3 LTS OS. So I'm mystified as to the reason. If anyone has experience debugging the LLVM build process and AzDO builds, feel free to investigate.

I built a new coredistools nuget package using all the successfully built components in the CI, plus linux-arm64 and linux-x64 built on my machine.

1. Update header file to have type definitions for all DLL exported
functions.
2. Remove unnecessary x86 instruction prefix handling (it was
working around a bug that is apparently fixed).
3. Remove coredistools special handling of movw/movt. Instead, add
a hopefully more general mechanism where cordistools will optionally
first call a "munger" function on constants. This callback in the
superpmi NearDiffer for arm32 will decode the movw/movt and treat
the constructed constant as being generated by one instruction.
This can also be used by arm64 mov/movk/movk/movk, although we currently
don't have a need for that.
@BruceForstall
Copy link
Member Author

@jakobbotsch @dotnet/jit-contrib PTAL

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this work!

@BruceForstall BruceForstall merged commit 74a49ca into dotnet:main Jan 3, 2024
15 of 18 checks passed
@BruceForstall BruceForstall deleted the UpdateCoredisTools branch January 3, 2024 20:42
@BruceForstall
Copy link
Member Author

Almost 9 months to get this in :-)

Unfortunately, linux-x64 and linux-arm64 builds in the CI still hang (or otherwise fail). However, they succeed when built using the same Docker build containers on my Ubuntu 22.04 WSL2 host, so that's what I used to build the 1.4.0 coredistools package.

It would still be nice to fix the CI if we could figure out how to debug the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update coredistools
4 participants