Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image build time improvements #10104

Merged

Conversation

saiarcot895
Copy link
Contributor

Why I did it

The last step of the SONiC build (building sonic-vs.bin or sonic-broadcom.bin or sonic-aboot-broadcom.bin) seemed to take a long time. I started looking at what it was doing and what could be done to reduce the time spent there.

How I did it

There are two major changes in this PR (see the individual commits for more details on what was done there):

  1. The first one is an issue being caused by debootstrap that would cause the /proc filesystem within the slave container to get umounted. This meant that /proc is now an empty folder. There were two visible side effects to this: each invocation of sudo added 0.8 seconds to the runtime (sudo itself would take 0.8 seconds more, then execute the actual command), and from the host machine, trying to docker exec into the container while the Debian image build was happening would fail, because it expects /proc/self/fd to be available. This is fixed in debootstrap 1.0.124, but since there's no backported version available for Bullseye, I am rebuilding debootstrap with the patch added in.
  2. The second one is to try to reduce the fsync calls done during deb package installations by dpkg. dpkg calls fsync many times, to make sure all new files are written to disk and that the filesystem is in a consistent state, so that if some system crash were to happen, it would be somewhat possible to recover the system. However, because of the nature of the build that we're doing, this is not really important for us. The eatmydata package can suppress these fsync calls and potentially improve application performance, at the risk of losing data or having the filesystem in an inconsistent state in the event of a system crash.

How to verify it

On my dev VM, before any of these changes, building the virtual switch image took 28 minutes (this is only counting the part of the build where build_debian.sh is run to build the base Debian image, load in all of the containers, and make image files). After the debootstrap patch, this decreased to 22 minutes. With the use of eatmydata, this decreased to 19-20 minutes. Actual numbers will vary depending on the system.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Currently, when the final image is being built (sonic-vs.img.gz,
sonic-broadcom.bin, or similar), each invocation of sudo in the
build_debian.sh script takes 0.8 seconds to run and execute the actual
command. This is because the /proc filesystem in the slave container has
been unmounted somehow. This is happening when debootstrap is running,
and it incorrectly unmounts the host's (in our case, the slave
container's) /proc filesystem because in the new image being built,
/proc is a symlink to the host's (the slave container's) /proc. Because
of that, /proc is gone, and each invocation of sudo adds 0.8 seconds
overhead. As a side effect, docker exec into the slave container during
this time will fail, because /proc/self/fd doesn't exist anymore, and
docker exec assumes that that exists.

Debootstrap has fixed this in 1.0.124 and newer, so backport the patch
that fixes this into the version that Bullseye has.

Signed-off-by: Saikrishna Arcot <[email protected]>
During package installations, dpkg calls fsync multiples times (for each
package) to ensure that tht efiles are written to disk, so that if
there's some system crash during package installation, then it is in at
least a somewhat recoverable state. For our use case though, we're
installing packages in a chroot in fsroot-* from a slave container and
then packaging it into an image. If there were a system crash (or even
if docker crashed), the fsroot-* directory would first be removed, and
the process would get restarted. This means that the fsync calls aren't
really needed for our use case.

The eatmydata package includes a library that will block/suppress the
use of fsync (and similar) system calls from applications and will
instead just return success, so that the application is not blocked on
disk writes, which can instead happen in the background instead as
necessary. If dpkg is run with this library, then the fsync calls that
it does will have no effect.

Therefore, install the eatmydata package at the beginning of
build_debian.sh and have dpkg be run under eatmydata for almost all
package installations/removals. At the end of the installation, remove
it, so that the final image uses dpkg as normal.

In my testing, this saves about 2-3 minutes from the image build time.

Signed-off-by: Saikrishna Arcot <[email protected]>
lguohan
lguohan previously approved these changes Mar 4, 2022
@lguohan
Copy link
Collaborator

lguohan commented Mar 4, 2022

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@lguohan
Copy link
Collaborator

lguohan commented Mar 4, 2022

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Contributor Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Contributor Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

build_debian.sh Outdated

echo '[INFO] Install and setup eatmydata'
sudo LANG=C chroot $FILESYSTEM_ROOT apt-get -y install eatmydata
sudo ln -s /usr/bin/eatmydata $FILESYSTEM_ROOT/usr/local/bin/dpkg
Copy link
Collaborator

@qiluo-msft qiluo-msft Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eatmydata

Why create this symlink? If needed, install it in slave? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlink is being created within the chroot for use within the chroot (the link is being created at $FILESYSTEM_ROOT/usr/local/bin/dpkg and pointing to /usr/bin/eatmydata). When chrooted, this allows all calls to dpkg to actually call eatmydata, which will then look for the actual binary in PATH.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean something like below?

sudo  LANG=C chroot $FILESYSTEM_ROOT ln -s /usr/bin/eatmydata /usr/local/bin/dpkg

If yes, one minor comment: it is not obvious that the intention is ln inside chroot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was trying to avoid the chroot here, but I'll change this to use chroot to make it clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has now been changed to use chroot to make it clear that the symlink is meant for use inside the chroot.

@@ -568,6 +574,10 @@ scripts/collect_host_image_version_files.sh $TARGET_PATH $FILESYSTEM_ROOT
# Remove GCC
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get -y remove gcc

# Remove eatmydata
sudo rm $FILESYSTEM_ROOT/etc/apt/apt.conf.d/00image-install-eatmydata $FILESYSTEM_ROOT/usr/local/bin/dpkg
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get -y remove eatmydata
Copy link
Collaborator

@qiluo-msft qiluo-msft Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eatmydata

After removing in chroot, you have a dead symlink outside. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlink is being created inside the chroot, and is being deleted on line 578, so there shouldn't be any dead symlink in the slave container.

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895
Copy link
Contributor Author

@qiluo-msft can you re-review this?

@xumia looks like the change you did for making sure /proc is mounted would've fixed the first change in this PR. I can remove this part of the change if you want.

@xumia
Copy link
Collaborator

xumia commented Mar 29, 2022

@qiluo-msft can you re-review this?

@xumia looks like the change you did for making sure /proc is mounted would've fixed the first change in this PR. I can remove this part of the change if you want.

Yes, I have a PR to make sure the /proc is mounted in the slave container. It should not be a correct usage to umount /proc, has impact on multiple programs, including the debootstrap (there is a warning message, complain /proc not mounted).
If mount/umount the /proc from the fsroot filesystem, then it is good. It is my understanding, the command "umount_on_exit /proc" is running in the fsroot, right?

@saiarcot895
Copy link
Contributor Author

Correct, umount_on_exit works on filesystems within the fsroot.

@saiarcot895
Copy link
Contributor Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Contributor

yxieca commented Apr 14, 2022

@qiluo-msft Are we ok to merge this PR?

@saiarcot895 saiarcot895 merged commit 330777e into sonic-net:master Apr 19, 2022
@saiarcot895 saiarcot895 deleted the image-build-time-improvements branch April 19, 2022 16:22
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
Related work items: #49, #58, #107, sonic-net#247, sonic-net#249, sonic-net#277, sonic-net#593, sonic-net#597, sonic-net#1035, sonic-net#2130, sonic-net#2150, sonic-net#2165, sonic-net#2169, sonic-net#2178, sonic-net#2179, sonic-net#2187, sonic-net#2188, sonic-net#2191, sonic-net#2195, sonic-net#2197, sonic-net#2198, sonic-net#2200, sonic-net#2202, sonic-net#2206, sonic-net#2209, sonic-net#2211, sonic-net#2216, sonic-net#7909, sonic-net#8927, sonic-net#9681, sonic-net#9733, sonic-net#9746, sonic-net#9850, sonic-net#9967, sonic-net#10104, sonic-net#10152, sonic-net#10168, sonic-net#10228, sonic-net#10266, sonic-net#10288, sonic-net#10294, sonic-net#10313, sonic-net#10394, sonic-net#10403, sonic-net#10404, sonic-net#10421, sonic-net#10431, sonic-net#10437, sonic-net#10445, sonic-net#10457, sonic-net#10458, sonic-net#10465, sonic-net#10467, sonic-net#10469, sonic-net#10470, sonic-net#10474, sonic-net#10477, sonic-net#10478, sonic-net#10482, sonic-net#10485, sonic-net#10488, sonic-net#10489, sonic-net#10492, sonic-net#10494, sonic-net#10498, sonic-net#10501, sonic-net#10509, sonic-net#10512, sonic-net#10514, sonic-net#10516, sonic-net#10517, sonic-net#10523, sonic-net#10525, sonic-net#10531, sonic-net#10532, sonic-net#10538, sonic-net#10555, sonic-net#10557, sonic-net#10559, sonic-net#10561, sonic-net#10565, sonic-net#10572, sonic-net#10574, sonic-net#10576, sonic-net#10578, sonic-net#10581, sonic-net#10585, sonic-net#10587, sonic-net#10599, sonic-net#10607, sonic-net#10611, sonic-net#10616, sonic-net#10618, sonic-net#10619, sonic-net#10623, sonic-net#10624, sonic-net#10633, sonic-net#10646, sonic-net#10655, sonic-net#10660, sonic-net#10664, sonic-net#10680, sonic-net#10683
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants