Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_mapping_recurse is very slow, causes ostree container builds to take a long time #4880

Closed
AdamWill opened this issue Mar 21, 2024 · 6 comments · Fixed by #4768
Closed

Comments

@AdamWill
Copy link

Describe the bug

This function seems to be a single-threaded, non-optimized recursive walk of the entire filesystem which checksums every single file it hits, and inexplicably checksums some of them twice. Is it really needed? If so, can it be optimized?

Reproduction steps

  1. Build an ostree container
  2. Stare at the screen, sadly, while it says "Building package mapping..."

Expected behavior

A much shorter period of sad screen staring

Actual behavior

Literal hours of it, in the case of Onyx (which seems to have an inordinately large amount of icons that bog it down for ages)

System details

[adamw@xps13a rpm-ostree (rawhide *%)]$ rpm-ostree --version
rpm-ostree:
 Version: '2024.4'
 Git: df536081aea75340f74f4b9430f7f707654961b1
 Features:
  - rust
  - compose
  - container
  - fedora-integration

Additional information

@nirik and I both noticed that the slowest thing in Fedora composes currently is the ostree_container phase. In today's F40 compose it took 10704 seconds (just about exactly three hours). In today's Rawhide it took 11562 seconds. No other phase in Rawhide took more than 4098 seconds (image_build).

Poking into it a bit more, we noticed that the Onyx container seems to be an outlier; it takes around an hour and a half longer than the one that takes the second longest (Kinoite). Running it interactively, we found it spends a very long time at a point logged as "Building package mapping...". I dug into the code a bit and found that this is just running the build_mapping_recurse function (which seems to be copylib'ed from ostree-rs-ext, kinda, although the versions are somewhat different).

This function seems to be a single-threaded, non-optimized recursive walk of the entire filesystem which checksums every single file it hits, and inexplicably checksums some of them twice, here:

            let checksum = child.checksum().to_string();
            match state.content.entry(checksum) {
                Entry::Vacant(v) => {
                    v.insert(pkgid);
                }
                Entry::Occupied(_) => {
                    let checksum = child.checksum().to_string();
                    let v = state.duplicates.entry(checksum).or_default();
                    v.push(pkgid);
                }
            }

why do we redo the checksum in the 'occupied' case, there? We just did it!

Anyhow, this seems like a pretty slow thing to have holding up all our composes. Is there any way this function could be optimized, reduced, nuked from orbit, or all of the above? I can't quite figure out why we need a mapping of file checksums to owning packages, anyway, and the commit message from what seems to be the initial version of this thing, over in ostree-rs-ext doesn't really enlighten me.

@AdamWill
Copy link
Author

AdamWill commented Mar 21, 2024

Here's something fun:

papirus-icon-theme (used by onyx): 97113 files
breeze-icon-theme (used by kinoite): 35182 files
adwaita-icon-theme (used by silverblue): 802 files

2024-03-20 07:51:39 [INFO    ] [BEGIN] OSTree phase for variant Onyx, arch x86_64
2024-03-20 07:51:39 [INFO    ] [BEGIN] OSTree phase for variant Kinoite, arch x86_64
2024-03-20 07:51:39 [INFO    ] [BEGIN] OSTree phase for variant Kinoite, arch ppc64le
2024-03-20 07:51:39 [INFO    ] [BEGIN] OSTree phase for variant Silverblue, arch x86_64

2024-03-20 08:22:12 [INFO    ] [DONE ] OSTree phase for variant Silverblue, arch x86_64
2024-03-20 08:38:47 [INFO    ] [DONE ] OSTree phase for variant Kinoite, arch x86_64
2024-03-20 09:07:21 [INFO    ] [DONE ] OSTree phase for variant Kinoite, arch ppc64le
2024-03-20 10:49:57 [INFO    ] [DONE ] OSTree phase for variant Onyx, arch x86_64

I filed this bug for the "does Budgie really need all those icons?" side of this.

@cgwalters
Copy link
Member

See #4768

@cgwalters
Copy link
Member

@AdamWill
Copy link
Author

which checksums every single file it hits

No, .checksum() is just accessing a cached checksum. https://docs.rs/ostree/latest/src/ostree/auto/repo_file.rs.html#34-38 ➡️ https://github.com/ostreedev/ostree/blob/dc4d4b5592d68a59afb0e4406dd2faa0df1b096b/src/libostree/ostree-repo-file.c#L435

Ahh, I see.

@AdamWill
Copy link
Author

A test run of an Onyx container build with #4768 applied completed in 13 minutes(!) in a mock chroot on my own system. I haven't yet run a build without the patch to completion locally, but it was definitely going to take a lot longer than that. I've started one running overnight and will see how long it took when I get up tomorrow, but that PR is definitely a huge improvement.

@AdamWill
Copy link
Author

time to build an Onyx ostree container (in a mock chroot on my laptop) with #4768 :

real	13m31.828s
user	7m58.593s
sys	2m58.602s

without it:

real	88m42.291s
user	34m54.632s
sys	50m33.717s

cgwalters pushed a commit to cgwalters/rpm-ostree that referenced this issue Mar 26, 2024
While toying around with building my own custom FCOS builds, I noticed that
running `cosa build container` with a package set similar to Silverblue's
resulted in ~2hr builds, the vast majority of which was in the
"Building package mapping" task. After this change, the runtime on
my build shrank to ~15 mins.

`$ time cosa build container`
**Before**
```
real    10m47.769s
user    52m14.763s
sys     46m38.546s
```

**After**
```
real    15m37.333s
user    2m38.751s
sys     0m14.410s
```

The speedup is accomplished by avoiding the need to query the rpmdb for every file.
Instead the rpmdb is walked to build a cache of the files to providing packages,
so that when the ostree filesystem is walked later it can just check the cache.
The cache is structured similarly to rpm's internals, where paths are maintained
as separate basename and dirname entries. Additionally, like rpm, the paths
are considered equivalent if the dirnames resolve to the same path
(rpm uses `stat` to compare inodes, this implementation resolves the
symlinks). This results in output that is effectively equivalent to the
previous implementation while being substantially faster.

To minimize memory overhead maintaining the file mapping, a simple string cache is also added.

Closes: coreos#4880
cgwalters pushed a commit that referenced this issue Mar 29, 2024
#4768)

While toying around with building my own custom FCOS builds, I noticed that running `cosa build container` with a package set similar to Silverblue's resulted in ~2hr builds, the vast majority of which was in the "Building package mapping" task. After this change, the runtime on my build shrank to ~15 mins.

`$ time cosa build container`
**Before**
```
real    10m47.769s
user    52m14.763s
sys     46m38.546s
```

**After**
```
real    15m37.333s
user    2m38.751s
sys     0m14.410s
```

The speedup is accomplished by avoiding the need to query the rpmdb for every file. Instead the rpmdb is walked to build a cache of the files to providing packages, so that when the ostree filesystem is walked later it can just check the cache. The cache is structured similarly to rpm's internals, where paths are maintained as separate basename and dirname entries. Additionally, like rpm, the paths are considered equivalent if the dirnames resolve to the same path (rpm uses `stat` to compare inodes, this implementation resolves the symlinks). This results in output that is effectively equivalent to the previous implementation while being substantially faster.

To minimize memory overhead maintaining the file mapping, a simple string cache is also added.

Closes: #4880
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants