Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mariner build images #832

Merged
merged 22 commits into from
Mar 30, 2023
Merged

Add mariner build images #832

merged 22 commits into from
Mar 30, 2023

Conversation

sbomer
Copy link
Member

@sbomer sbomer commented Mar 27, 2023

This adds build images for CBL-mariner which can be used to build dotnet/runtime (using the changes in dotnet/runtime@main...sbomer:runtime:marinerBuild). All of these images (including x64) contain a rootfs, to ensure consistent glibc and musl targeting. The rootfs is built using the changes in dotnet/arcade#12900, which expect debootstrap to be available on the host (for the ubuntu rootfs). We obtain debootstrap from the mariner extended repos for now.

The rootfs distros are:

  • Ubuntu 16.04 (Xenial), with glibc 2.23
  • Alpine 3.13, with musl libc 1.2.2

The images are all using LLVM 12, matching the version available in the mariner repositories. They don't contain binutils on the host, and expect the build to use LLVM-based tools. We plan to upgrade to LLVM 16 in a future change.

For x64 and x86, we use LLVM from the mariner repos. For arm/arm64, we build LLVM from source. We also build the compiler-rt.profile library (used for PGO instrumentation) for the target architecture where necessary.

Arm images will be added in a future change (I still need to investigate the libnuma dependency which isn't available on xenial or alpine 3.13).

Contributes to dotnet/runtime#83428.

Fixes #833

@sbomer
Copy link
Member Author

sbomer commented Mar 27, 2023

@janvorli @am11 @mthalman PTAL.

@mthalman
Copy link
Member

Mariner 1.0 build is failing due to #833

@mthalman
Copy link
Member

Looks like the llvm stuff is causing the build to take much longer. Please update the timeout of the pipeline. You can see an example of that here:

I would suggest a value of 150. This needs to be applied to two pipelines:

@sbomer
Copy link
Member Author

sbomer commented Mar 28, 2023

I'm not sure 150 will be enough - since after the llvm build completes, we still need to build compiler-rt.profile for various architectures, and create the rootfs which can take a while. I'm trying 360, but if that's excessive we can lower it.

@sbomer
Copy link
Member Author

sbomer commented Mar 28, 2023

The raspbian failure is unrelated and is fixed by #829.

Copy link
Member

@mthalman mthalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build failure is because the infrastructure doesn't support local-only Dockerfiles. There must be at least one tag associated with a Dockerfile that does not have isLocal set. So you'd need to add a unique tag for each of these crossdeps Dockerfiles.

@sbomer
Copy link
Member Author

sbomer commented Mar 28, 2023

@mthalman from the failure logs it looked to me like the local-only images were being found. For example I see:

Successfully built 8f6b10a1c55f
Successfully tagged mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-crossdeps-artifacts-local
Step 1/9 : FROM mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-crossdeps-artifacts-local as artifacts
 ---> 8f6b10a1c55f

It doesn't seem to fail until later, while executing debootstrap - is that related to isLocal? I'm not sure I understand why local-only images would result in failures like this one.

mkdir -p $ROOTFS_DIR/usr/lib/llvm-${LLVM_VERSION_MAJOR}/lib/clang/${LLVM_VERSION}/lib/linux/ && \
cp compiler-rt_build/lib/linux/libclang_rt.profile-aarch64.a $ROOTFS_DIR/usr/lib/llvm-${LLVM_VERSION_MAJOR}/lib/clang/${LLVM_VERSION}/lib/linux/

RUN rm -r compiler-rt_build llvm-project.src
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember how docker works correctly, although this "rm" removes the directories, they will stay hidden in the image, making it unnecessarily larger. In other words, every "RUN" command adds a layer to the docker image. Consider everything that exists at the end of the "RUN" as "committed" into the image forever. So the "RUN" with rm basically just creates a "commit" that removes the visibility of the files. Similar to what removing a file in a git commit does. So if you really want to remove some build artifacts, it should be done in the same "RUN" command that created the files.
The multistage builds were invented to remedy this problem in a cleaner way. Each stage can copy selected files from the previous stage and things that were not copied are thrown away. See https://docs.docker.com/build/building/multi-stage/ for more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they do. This is what many of the commands are often linked together with && \.
(and when packages are added we clean each layer separately)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I optimized the builds more for container size - PTAL.

@sbomer
Copy link
Member Author

sbomer commented Mar 29, 2023

@mthalman I believe the failure earlier was due to missing qemu-user-static on the host machine. It seems like the local-only images were working. Is it ok to change back to local-only images for the intermediate images that don't need to be consumed in dotnet/runtime?

This allows running arm64 binaries in the chroot on the x64 host.
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD="host;AArch64;ARM" \
-Wno-dev \
-DLLVM_ENABLE_PROJECTS="clang;lld" && \
Copy link
Member

@am11 am11 Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-DLLVM_ENABLE_PROJECTS="clang;lld" && \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;lldb" && \

This is to make sure we have lld-12, lldb-12, llvm-objcopy-12, llvm-objdump-12, llvm-ar-12, llvm-nm-12, llvm-ranlib-12, llvm-link-12, llvm-dwarfdump-12 and llvm-symbolizer-12 available in PATH. Please double check that these prerequisites, used by various runtime CI legs, are available in the final image's host layer: find /usr -type f -path '*bin*' -name 'llvm*' (crossrootfs only needs llvm-libs which is handled by build-rootfs.sh).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tools (without version suffix) all get installed with just the "clang;lld" projects:

# find /usr -type f -path '*bin*' -name 'llvm*'
/usr/local/bin/llvm-pdbutil
/usr/local/bin/llvm-diff
/usr/local/bin/llvm-as
/usr/local/bin/llvm-size
/usr/local/bin/llvm-bcanalyzer
/usr/local/bin/llvm-undname
/usr/local/bin/llvm-objdump
/usr/local/bin/llvm-exegesis
/usr/local/bin/llvm-mca
/usr/local/bin/llvm-dwarfdump
/usr/local/bin/llvm-ml
/usr/local/bin/llvm-cxxdump
/usr/local/bin/llvm-c-test
/usr/local/bin/llvm-reduce
/usr/local/bin/llvm-lipo
/usr/local/bin/llvm-extract
/usr/local/bin/llvm-objcopy
/usr/local/bin/llvm-split
/usr/local/bin/llvm-mt
/usr/local/bin/llvm-elfabi
/usr/local/bin/llvm-opt-report
/usr/local/bin/llvm-link
/usr/local/bin/llvm-cxxfilt
/usr/local/bin/llvm-modextract
/usr/local/bin/llvm-profdata
/usr/local/bin/llvm-rc
/usr/local/bin/llvm-jitlink
/usr/local/bin/llvm-rtdyld
/usr/local/bin/llvm-mc
/usr/local/bin/llvm-nm
/usr/local/bin/llvm-readobj
/usr/local/bin/llvm-cfi-verify
/usr/local/bin/llvm-libtool-darwin
/usr/local/bin/llvm-symbolizer
/usr/local/bin/llvm-xray
/usr/local/bin/llvm-cat
/usr/local/bin/llvm-lto
/usr/local/bin/llvm-cov
/usr/local/bin/llvm-profgen
/usr/local/bin/llvm-tblgen
/usr/local/bin/llvm-cxxmap
/usr/local/bin/llvm-config
/usr/local/bin/llvm-lto2
/usr/local/bin/llvm-cvtres
/usr/local/bin/llvm-stress
/usr/local/bin/llvm-ar
/usr/local/bin/llvm-gsymutil
/usr/local/bin/llvm-ifs
/usr/local/bin/llvm-strings
/usr/local/bin/llvm-dwp
/usr/local/bin/llvm-dis

Are clang-tools-extra and lldb additionally required on the host?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking.

lldb is required in diagnostics repo's CI (since forever) and in runtime repo (since dotnet/runtime#82867) to troubleshoot stuff in the very image reporting a crash.

clang-tools-extra is not required but would be nice to put in the image while we are at it, so we don't have to rely on random (unmaintained) blob for clang-format and clang-tidy: https://github.com/dotnet/jitutils/blob/main/bootstrap.sh#L90 which CI in runtime end-up using in jit-diff legs.

llvm-ranlib seems to be missing in the list. From alpine-arm64 prereq image:

$ docker run -it mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-22.04-cross-arm64 bash -c 'find /usr -name llvm* | grep ranlib'

/usr/lib/llvm-15/bin/llvm-ranlib
/usr/bin/llvm-ranlib-15

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry - llvm-ranlib is a symlink:

# ls -al /usr/local/bin/llvm-ranlib 
lrwxrwxrwx 1 root root 7 Mar 29 17:36 /usr/local/bin/llvm-ranlib -> llvm-ar

I don't plan to update the jit formatting jobs as part of this series of changes so my preference would be to leave out clang-format and clang-tidy for now. We can add them later when we update to recent formatting tools.

For lldb, I'd like to focus on just the dotnet/runtime build dependencies in this change. I think you're right that we will need it for dotnet/diagnostics and the runtime helix images, but I'd like to do that as a separate change if that sounds ok to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. 👍


COPY --from=builder $ROOTFS_DIR $ROOTFS_DIR

RUN LLVM_VERSION=12.0.1 LLVM_VERSION_MAJOR="${LLVM_VERSION%%.*}" && \
Copy link
Member

@am11 am11 Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can directly use 15.0.7, which has been validated in industry and revised seven times since its initial release (has tons of fixes and improvements). 16 is very new and 12 is very old. We are already using 15 for linux-musl official builds.

cc @jkotas, @hoyosjs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current plan is to ship building with LLVM 16, which will be supported by mariner. Once they add support we should be able to stop building it ourselves. 16 also won't be quite as new by the time we ship .NET 8.

I plan to make the update to 16 fairly soon after this goes in - the reason I'm not doing it here is that it's not quite as simple as updating the version number (there were some missing dependencies for the compiler-rt builds), and I wanted to prioritize getting to a point where I can run dotnet/runtime ci legs with the new images. I am sure there will be problems I haven't caught with my local testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why v12.0.1 specifically even as a temporary solution? If it was v9.0.x (our go-to llvm version in official legs) or v15.0.x (modern version used in latest linux-musl legs) that would make more sense for a temporary solution. AFAIK, we aren't using 12.0.x explicitly for anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose it because it matches the version available in the cbl-mariner repos, so we can just install it for the x64 and x86 builds (crossdeps-amd64 does this). It is how I discovered that the current mariner package doesn't support targeting arm/arm64.

@mthalman
Copy link
Member

@mthalman I believe the failure earlier was due to missing qemu-user-static on the host machine. It seems like the local-only images were working. Is it ok to change back to local-only images for the intermediate images that don't need to be consumed in dotnet/runtime?

Ok, I thought I was seeing something else. Go ahead and try switching back. Hopefully that works then.

@mthalman
Copy link
Member

I believe the failure earlier was due to missing qemu-user-static on the host machine. It seems like the local-only images were working. Is it ok to change back to local-only images for the intermediate images that don't need to be consumed in dotnet/runtime?

Ok, I thought I was seeing something else. Go ahead and try switching back. Hopefully that works then.

The current build failure is this:

Unhandled exception: System.InvalidOperationException: Sequence contains no matching element
   at System.Linq.ThrowHelper.ThrowNoMatchException()
   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source, Func`2 predicate)
   at Microsoft.DotNet.ImageBuilder.Commands.BuildCommand.PublishImageInfoAsync() in /image-builder/src/Commands/BuildCommand.cs:line 155
   at Microsoft.DotNet.ImageBuilder.Commands.BuildCommand.<ExecuteAsync>b__14_0() in /image-builder/src/Commands/BuildCommand.cs:line 77
   at Microsoft.DotNet.ImageBuilder.DockerHelper.ExecuteWithUserAsync(Func`1 action, String username, String password, String server, Boolean isDryRun) in /image-builder/src/DockerHelper.cs:line 49
   at Microsoft.DotNet.ImageBuilder.Commands.BuildCommand.ExecuteAsync() in /image-builder/src/Commands/BuildCommand.cs:line 66
   at System.CommandLine.Invocation.CommandHandler.GetResultCodeAsync(Object value, InvocationContext context)
   at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()

This is what I saw earlier that is caused by having local-only tags. So you'll need to add those non-local tags back.

@sbomer sbomer requested a review from janvorli March 30, 2023 00:29
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@sbomer
Copy link
Member Author

sbomer commented Mar 30, 2023

@mthalman please merge when you have a chance. Thanks for all the help getting this to green!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update image name for Mariner 1.0
5 participants