Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/dnfjson: fix size calculation bug with multidir caches #246

Merged
merged 3 commits into from
Nov 24, 2023

Conversation

diaasami
Copy link
Contributor

@diaasami diaasami commented Nov 8, 2023

COMPOSER-2068

@achilleas-k achilleas-k self-requested a review November 8, 2023 17:57
@diaasami diaasami changed the title internal/dnfjson: fix caching bug with multidir caches internal/dnfjson: fix size calculation bug with multidir caches Nov 9, 2023
@diaasami diaasami force-pushed the multidir_cache_bug branch 2 times, most recently from bc27278 to 3220d73 Compare November 10, 2023 09:40
@diaasami diaasami marked this pull request as ready for review November 10, 2023 11:24
@ondrejbudai ondrejbudai self-requested a review November 10, 2023 11:40
@achilleas-k
Copy link
Member

Something to note here: I'm trying to test the difference between main and this PR with some extreme cases and while I know there's a difference, I'm not seeing an issue with the existing things in main. Try this, for example:

  • Create a directory for the rpmmd cache /var/cache/rpmmd
  • Watch the directory sizes (with sudo): sudo watch -n1 du -hd2 /var/cache/rpmmd
  • Chance the max cache size to 10 MB (change this to 1024*1024*10).
  • Run some depsolves
go run ./cmd/gen-manifests --workers=10 --cache=/var/cache/rpmmd --distros=rhel-92,rhel-93 --arches=x86_64,aarch64 --output=/media/scratch/osbuild/manifests/ --images=qcow2

You'll see that some distro repos are deleted completely (because the limit is so low) and others (like RHEL 9.3 x86) stay at 4.6 MiB.

It also gets a bit more interesting if you set the limit to 17 MiB. You'll see some distro caches reach 20-30 MiB and then get deleted to 14-15 MiB after the depsolve is done. Which to me means the cleanup is actually working (but with the per-distro limit), so I wonder what went wrong originally.

@achilleas-k
Copy link
Member

Actually, gen-manifests is a bad example because it uses a separate solver and subdirectory for each arch+distro. We should maybe change that.

@ondrejbudai
Copy link
Member

The issue is that the service workers set an empty distribution: https://github.com/osbuild/osbuild-composer/blob/a1e428fc539271333d71902672dff2407947ae62/cmd/osbuild-worker/jobimpl-depsolve.go#L23

So these caches are not stored in a distribution subdirectory, but in a cache root, which breaks the whole algorithm. Maybe the fix could be to disallow passing an empty distribution (or use an unknown subdir). Nevertheless, I think that we should one global cache size, because the number of distributions isn't bound.

@achilleas-k
Copy link
Member

achilleas-k commented Nov 14, 2023

If we use unknown in place of an empty distro name, then all repos will share the same cache root, which is probably fine. I think in that case, depsolving will only block on individual repo locks from dnf itself. When depsolving, we only take a non-exclusive read lock on the whole cache to prevent a cleanup from running while calling dnf-json. Write locking of individual repos for updating metadata is deferred to libdnf.

Given all this, it makes me wonder if the per-repo cache roots are even necessary in the end. Maybe it's fine to use a single cache for everything?

A solution that fixes the current issue without changing behaviour is to drop the distro arg from NewWithConfig() and just use modulePlatformID + releaseVer + arch, which would have the same effect and those strings are required by libdnf so they can't be empty.

@diaasami diaasami force-pushed the multidir_cache_bug branch 2 times, most recently from 803c885 to 9883885 Compare November 19, 2023 16:48
@diaasami diaasami requested review from ondrejbudai and removed request for ondrejbudai November 20, 2023 17:18
achilleas-k
achilleas-k previously approved these changes Nov 23, 2023
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few non-functional nitpicks and questions :)

One more thing though:
I think we should remove the distro here:

func (bs *BaseSolver) NewWithConfig(modulePlatformID, releaseVer, arch, distro string) *Solver {

and instead use the other components to make a cache subdirectory here:
func (s *Solver) GetCacheDir() string {
b := filepath.Base(s.distro)
if b == "." || b == "/" {
return s.cache.root
}
return filepath.Join(s.cache.root, s.distro)
}

The other components (platform ID, os version, architecture) are mandatory for depsolving, so they can't be empty strings or the depsolver will fail. That way, we avoid running into any issues (like we already did) with the solver being initialised with an empty distro string, essentially pushing the distro caches one level up, and we still get distro-specific caches.

I don't know if there might be a conflict between RHEL 9 and CS9 (if osversion of rhel 9 is 9, then they both get el9 platform ID and os version 9).

Another alternative is to make the distro mandatory and fail if it's an empty string.

internal/dnfjson/cache.go Outdated Show resolved Hide resolved
internal/dnfjson/cache.go Outdated Show resolved Hide resolved
internal/dnfjson/cache_test.go Show resolved Hide resolved
@achilleas-k
Copy link
Member

I have some comments and a recommendation, but approving since they're not critical and can be done in a follow-up if we need this PR to be merged quickly.

update test data so that TestMultiDirCacheRead fails before the fix and succeeds after
Golang's os.ReadDir() returns entries in alphabetical order and since for our test data, the alphabetical order of the entries matches the mtime-based sorting, so tests pass even with no sorting at all.
This change breaks the alphabetical order of data.
instead of distro which is sometimes empty (on brew workers)
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work thanks for fixing this :)

@achilleas-k achilleas-k added this pull request to the merge queue Nov 24, 2023
Merged via the queue into osbuild:main with commit 700c404 Nov 24, 2023
7 checks passed
@diaasami diaasami deleted the multidir_cache_bug branch November 26, 2023 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants