Skip to content

Commit

Permalink
[MGMT-14453](https://issues.redhat.com//browse/MGMT-14453): Fix bugs …
Browse files Browse the repository at this point in the history
…in the installer cache

This PR is for the purpose of resolving  multiple bugs within the installer cache, due to the poor condition of the current cache, it makes sense to fix this in a single PR.

* https://issues.redhat.com/browse/MGMT-14452
Installer cache removes in-used cached image when out of space
* https://issues.redhat.com/browse/MGMT-14453
INSTALLER_CACHE_CAPACITY small value cause to assisted-service crash
* https://issues.redhat.com/browse/MGMT-14457
Installer cache - fails to install when running parallel with same version
* Additionally, the cache did not respect limits, so this has been addressed here.

Fixes:

I have implemented fixes for each of the following issues.

* Mutex was ineffective as not instantiated corrctly, leading to [MGMT-14452](https://issues.redhat.com//browse/MGMT-14452), [MGMT-14453](https://issues.redhat.com//browse/MGMT-14453).
* Naming convention for hardlinks changed to be UUID based to resolve [MGMT-14457](https://issues.redhat.com//browse/MGMT-14457).
* Any time we either extract or use a release, the modified time must be updated, not only for cached releases. This was causing premature pruning of hardlinks.
* LRU cache order updated to be based on microseconds instead of seconds.
* Eviction checks updated to consider max release size and also cache threshold.
* We now check there is enough space before writing.
* During eviction - releases without hard links will be evicted before releases with hard links.
  • Loading branch information
paul-maidment committed Feb 4, 2025
1 parent ef9f0c8 commit 6938c2f
Show file tree
Hide file tree
Showing 12 changed files with 677 additions and 225 deletions.
11 changes: 9 additions & 2 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ import (
"github.com/openshift/assisted-service/internal/ignition"
"github.com/openshift/assisted-service/internal/infraenv"
installcfg "github.com/openshift/assisted-service/internal/installcfg/builder"
"github.com/openshift/assisted-service/internal/installercache"
internaljson "github.com/openshift/assisted-service/internal/json"
"github.com/openshift/assisted-service/internal/manifests"
"github.com/openshift/assisted-service/internal/metrics"
Expand Down Expand Up @@ -166,6 +167,7 @@ var Options struct {
PauseProvisionedBMHs bool `envconfig:"PAUSE_PROVISIONED_BMHS" default:"true"`
PreprovisioningImageControllerConfig controllers.PreprovisioningImageControllerConfig
BMACConfig controllers.BMACConfig
InstallerCacheConfig installercache.Config

// EnableSoftTimeouts is a boolean flag to enable Soft timeouts by assisted installer
EnableSoftTimeouts bool `envconfig:"ENABLE_SOFT_TIMEOUTS" default:"false"`
Expand Down Expand Up @@ -315,7 +317,8 @@ func main() {
metricsManagerConfig := &metrics.MetricsManagerConfig{
DirectoryUsageMonitorConfig: metrics.DirectoryUsageMonitorConfig{
Directories: []string{Options.WorkDir}}}
metricsManager := metrics.NewMetricsManager(prometheusRegistry, eventsHandler, metrics.NewOSDiskStatsHelper(), metricsManagerConfig, log)
diskStatsHelper := metrics.NewOSDiskStatsHelper(log)
metricsManager := metrics.NewMetricsManager(prometheusRegistry, eventsHandler, diskStatsHelper, metricsManagerConfig, log)
if ocmClient != nil {
//inject the metric server to the ocm client for purpose of
//performance monitoring the calls to ACM. This could not be done
Expand Down Expand Up @@ -488,7 +491,11 @@ func main() {
failOnError(err, "failed to create valid bm config S3 endpoint URL from %s", Options.BMConfig.S3EndpointURL)
Options.BMConfig.S3EndpointURL = newUrl

generator := generator.New(log, objectHandler, Options.GeneratorConfig, Options.WorkDir, providerRegistry, manifestsApi, eventsHandler)
Options.InstallerCacheConfig.CacheDir = filepath.Join(Options.GeneratorConfig.GetWorkingDirectory(), "installercache")
installerCache, err := installercache.New(Options.InstallerCacheConfig, eventsHandler, diskStatsHelper, log)
failOnError(err, "failed to instantiate installercache")

generator := generator.New(log, objectHandler, Options.GeneratorConfig, providerRegistry, manifestsApi, eventsHandler, installerCache)
var crdUtils bminventory.CRDUtils
if ctrlMgr != nil {
crdUtils = controllers.NewCRDUtils(ctrlMgr.GetClient(), hostApi)
Expand Down
56 changes: 56 additions & 0 deletions docs/dev/installercache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Installer Cache for Openshift Releases
During the phase where a cluster is prepared for installation, the manifests for the cluster need to be generated.
In order to generate manifests, the installation binary for the desired version of Openshift is required.
It is the job of the Installer Cache to provide this binary.

The installer cache can be operated in a mode where it will function as a 'Least Recently Used' (LRU cache) or as a simple directory of releases.

## Settings
The installer cache has a number of settings, which can be overridden by providing them in as environment variables in the assisted service config map.

### INSTALLER_CACHE_CAPACITY

This defaults to `0`, which means that there is no limit to the size of the installer cache.
This should be kept at zero if the storage directory is not ephemeral.

Otherwise, it will accept a capacity, followed by either "GiB", "MiB", "KiB" or "B"

For example "32 GiB"

### INSTALLER_CACHE_MAX_RELEASE_SIZE

This is the estimated maximum size that we believe any Openshift release could ever be. The default at the time of writing is 2 GiB.
This setting is used by the installercache to decide if there is enough space to write a release to the cache or if some things will need to be evicted first.

This will accept a capacity, followed by either "GiB", "MiB", "KiB" or "B"
For example "1 GiB"

### INSTALLER_CACHE_RELEASE_FETCH_RETRY_INTERVAL

If the cache finds itself unable to write a release (as indicated by `INSTALLER_CACHE_MAX_RELEASE_SIZE`) to the cache without breaking the cache limit
then it is possible that a request to fetch a release may need to be retried.
`INSTALLER_CACHE_RELEASE_FETCH_RETRY_INTERVAL` is the interval at which this retry should be attempted.
This is expressed as a duration, for example "30s"

## Where the files are stored

The files will be stored on the volume that is mapped to the working directory of the pod, defined as `WORK_DIR` in environment variables.
There is one instance of the installer cache per node. This means that in SAAS for example, there are three independent caches, one for each node.
This is entirely expected and normal.

## Usage

Usage of the cache is quite straightforward...

The caller of the installer cache makes a call to

```
func (i *Installers) Get(ctx context.Context, releaseID, releaseIDMirror, pullSecret string, ocRelease oc.Release, ocpVersion string, clusterID strfmt.UUID) (*Release, error)
```

This call returns an `*installercache.Release` instance that the caller must retain until they are finished using the release. Once the caller is done, they call `release.Cleanup(...)` which advises the cache that the user is done with the release.

The cache guarantees to never breach `INSTALLER_CACHE_CAPACITY`.



1 change: 1 addition & 0 deletions internal/bminventory/inventory.go
Original file line number Diff line number Diff line change
Expand Up @@ -1416,6 +1416,7 @@ func (b *bareMetalInventory) InstallClusterInternal(ctx context.Context, params
}()

if err = b.generateClusterInstallConfig(asyncCtx, *cluster, clusterInfraenvs); err != nil {
log.WithError(err).Errorf("failed to generate cluster install config for cluster %s", cluster.ID.String())
return
}
log.Infof("generated ignition for cluster %s", cluster.ID.String())
Expand Down
16 changes: 10 additions & 6 deletions internal/ignition/installmanifests.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ type installerGenerator struct {
cluster *common.Cluster
releaseImage string
releaseImageMirror string
installerDir string
serviceCACert string
encodedDhcpFileContents string
s3Client s3wrapper.API
Expand All @@ -110,24 +109,23 @@ var fileNames = [...]string{
}

// NewGenerator returns a generator that can generate ignition files
func NewGenerator(workDir string, installerDir string, cluster *common.Cluster, releaseImage string, releaseImageMirror string,
func NewGenerator(workDir string, cluster *common.Cluster, releaseImage string, releaseImageMirror string,
serviceCACert string, installInvoker string, s3Client s3wrapper.API, log logrus.FieldLogger, providerRegistry registry.ProviderRegistry,
installerReleaseImageOverride, clusterTLSCertOverrideDir string, storageCapacityLimit int64, manifestApi manifestsapi.ManifestsAPI, eventsHandler eventsapi.Handler) Generator {
installerReleaseImageOverride, clusterTLSCertOverrideDir string, manifestApi manifestsapi.ManifestsAPI, eventsHandler eventsapi.Handler, installerCache *installercache.Installers) Generator {
return &installerGenerator{
cluster: cluster,
log: log,
releaseImage: releaseImage,
releaseImageMirror: releaseImageMirror,
workDir: workDir,
installerDir: installerDir,
serviceCACert: serviceCACert,
s3Client: s3Client,
enableMetal3Provisioning: true,
installInvoker: installInvoker,
providerRegistry: providerRegistry,
installerReleaseImageOverride: installerReleaseImageOverride,
clusterTLSCertOverrideDir: clusterTLSCertOverrideDir,
installerCache: installercache.New(installerDir, storageCapacityLimit, eventsHandler, log),
installerCache: installerCache,
manifestApi: manifestApi,
}
}
Expand Down Expand Up @@ -179,8 +177,14 @@ func (g *installerGenerator) Generate(ctx context.Context, installConfig []byte)
if err != nil {
return errors.Wrap(err, "failed to get installer path")
}

//cleanup resources at the end
defer release.Cleanup(ctx)
defer func() {
e := release.Cleanup(ctx)
if e != nil {
err = errors.Wrapf(err, "unable to clean up release due to error: %s", e.Error())
}
}()

installerPath := release.Path
installConfigPath := filepath.Join(g.workDir, "install-config.yaml")
Expand Down
Loading

0 comments on commit 6938c2f

Please sign in to comment.