Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.17 #429

Merged
merged 25 commits into from
Dec 9, 2024
Merged

Release v0.1.17 #429

merged 25 commits into from
Dec 9, 2024

Conversation

roehrich-hpe
Copy link
Contributor

Release v0.1.17

bdevcich and others added 25 commits November 13, 2024 15:45
Create v1alpha4 APIs.

This used "kubebuilder create api --resource --controller=false"
for each API.

Signed-off-by: Blake Devcich <[email protected]>
Copy API content from v1alpha3 to v1alpha4.

Move the kubebuilder:storageversion marker from v1alpha3 to v1alpha4.

Set localSchemeBuilder var in api/v1alpha3/groupversion_info.go
to satisfy zz_generated.conversion.go.

Signed-off-by: Blake Devcich <[email protected]>
Move the existing webhooks from v1alpha3 to v1alpha4.

Signed-off-by: Blake Devcich <[email protected]>
Create conversion webhooks and hub routines for v1alpha4.

This may have used "kubebuilder create webhook --conversion" for any
API that did not already have a webhook.

Any newly-created api/v1alpha4/*_webhook_test.go is empty and
does not need content at this time. It has been updated with a comment
to explain where conversion tests are located.

ACTION: Any new tests added to
  github/cluster-api/util/conversion/conversion_test.go
  may need to be manually adjusted. Look for the "ACTION" comments
  in this file.

This may have added a new SetupWebhookWithManager() to suite_test.go,
though a later step will complete the changes to that file.

Signed-off-by: Blake Devcich <[email protected]>
Create conversion routines and tests for v1alpha3.

Switch api/v1alpha3/conversion.go content from hub to spoke.

These conversion.go ConvertTo()/ConvertFrom() routines are complete
and do not require manual adjustment at this time, because v1alpha3 is
currently identical to the new hub v1alpha4.

ACTION: The api/v1alpha3/conversion_test.go may need to be
  manually adjusted for your needs, especially if it has been manually
  adjusted in earlier spokes.

ACTION: Any new tests added to internal/controller/conversion_test.go
  may need to be manually adjusted.

This added api/v1alpha3/doc.go to hold the k8s:conversion-gen
marker that points to the new hub.

Signed-off-by: Blake Devcich <[email protected]>
Point controllers at new hub v1alpha4

Point conversion fuzz test at new hub. These routines are still
valid for the new hub because it is currently identical to the
previous hub.

ACTION: Some controllers may have been referencing one of these
  non-local APIs. Verify that these APIs are being referenced
  by their correct versions:
  DirectiveBreakdown, Workflow
Signed-off-by: Blake Devcich <[email protected]>
Point earlier spoke APIs at new hub v1alpha4.

The conversion_test.go and the ConvertTo()/ConvertFrom() routines in
conversion.go are still valid for the new hub because it is currently
identical to the previous hub.

Update the k8s:conversion-gen marker in doc.go to point to the new hub.

ACTION: Some API libraries may have been referencing one of these
  non-local APIs. Verify that these APIs are being referenced
  by their correct versions:
  DirectiveBreakdown, Workflow
Signed-off-by: Blake Devcich <[email protected]>
Make the auto-generated files.

Update the SRC_DIRS spoke list in the Makefile.

make manifests & make generate & make generate-go-conversions
make fmt

ACTION: If any of the code in this repo was referencing non-local
  APIs, the references to them may have been inadvertently
  modified. Verify that any non-local APIs are being referenced
  by their correct versions.

ACTION: Begin by running "make vet". Repair any issues that it finds.
  Then run "make test" and continue repairing issues until the tests
  pass.
Signed-off-by: Blake Devcich <[email protected]>
…activate for non-lustre

We currently do not have a way to perform actions (e.g. `lfs setstripe`)
on lustre filesystems from the client side. For situations like
`DataIn`, there is no way to prepare the lustre filesystem prior to data
movement.

This change adds PostMount and PreUnmount command lines to the
NnfStorageProfile to allow commands to be run in those situations.

- For lustre filesystems, PostMount and PreUnmount have been added in
  addition to the existing PostActivate/PreDeactivate commands
- PostActivate/PreDeactivate are performed server-side
- PostMount/PreUnmount are performed client-side
- For XFS/GFS2 filesystems, PostActivate/PreDeactivate have been
  renamed to PostMount/PreUnmount

For lustre, multiple NnfNodeStorages are created for each OST, MDT, and
MGT. PostMount should only happen once, so OST0 is what is used. Before
the filesystem can be mounted to run the commands, we need ensure that
all other NnfNodeStorages are ready. Once that happens, OST0 can then be
created and then the NnfNodeStorage controller can run the PostMount
commands. The opposite logic applies in the PreUnmount case where OST0
is now deleted first and performs the PreUnmount commands.

For XFS/GFS2, there is no issue of ordering.

Signed-off-by: Blake Devcich <[email protected]>
* Add shared option to NnfSystemStorage

The shared option determines whether to create a single allocation per Rabbit
that is shared between all the computes on the Rabbit, or one allocation per
compute node.

Signed-off-by: Matt Richerson <[email protected]>

* default to shared=true for v1alpha2 and v1alpha3

Signed-off-by: Matt Richerson <[email protected]>

---------

Signed-off-by: Matt Richerson <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
The default storage profile was only using the $VG_NAME, causing
lvRemove to remove all logical volumes in the volume group.
Additionally, volume groups were being removed without first checking to
see if logical volumes exist in the volume group.

While both of these issues have no real affect on the removal of the
LVs/VGs, this was causing issues with the new PreUnmount commands. The
first logical volume to run the lvRemove command was the only
fileysystem to run the PreUnmount commands, since it destroys all the
other filesystems on the rabbit in that single volume group.

With this change, the PreUnmmount command runs on each filesystem before
it is destroyed.

Signed-off-by: Blake Devcich <[email protected]>
On the docker host, the /tmp/nnf dir is expected to be mounted into each
docker container as /mnt/nnf. This should be specified as an extraMounts in
the KIND config file.

The k8s pods will add volume mounts for /mnt/nnf.

When using KIND, new mock devices will be represented as directories in
/mnt/nnf. New filesystems will be represented as directories in /mnt/nnf,
containing per-rabbit symlinks back to the mock "device" directory.

For a user container to use these mock filesystems, their pod must have a
volume mount for the mock filesystem and for the mock device directory, where
the symlink is pointing.

Signed-off-by: Dean Roehrich <[email protected]>
)

This option is used to allow the NnfSystemStorage to succeed even when there are computes that
are offline. As computes come back online, they will be given access to the storage.

Signed-off-by: Matt Richerson <[email protected]>
The NnfNodeBlockStorage resource is not owned by the NnfAccess, so the NnfAccess won't
be requeued after the client cache updates.

Signed-off-by: Matt Richerson <[email protected]>
This was a workaround to account for creating multiple OSTs in an
unorthodox manner in the Servers resource. Flux won't create multiple
allocation sets on a single rabbit, but rather use the count when there
are multiple allocations on a single rabbit.

This workaround causes a bug and is not needed.

Signed-off-by: Blake Devcich <[email protected]>
There is no way to get the number of OSTs, etc when using `PostMount`
commands. For instance, when setting the striping.

This change adds the following environment variables for use in the
`NnfStorageProfiles` when using `*CmdLines`:

- NUM_MDTS
- NUM_MGTS
- NUM_MGTMDTS
- NUM_OSTS
- NUM_NNFNODES

To support this, a list of nodes for each component type has been added
to the status of `NnfStorage`, which in turn is copied to the
`NnfNodeStorage` resource's spec. This info can then be turned into the
environment variables for use when running the commands.

The `.nnf-servers.json` file created to store this information on the
compute node has been updated to this new structure.

An example with 8 OSTs and 2 MGTMDTs on 1 rabbit:

```json
{
  "mdt": [],
  "mgt": [],
  "mgtmdt": [
    "rabbit-node-1",
    "rabbit-node-1"
  ],
  "nnfNode": [
    "rabbit-node-1"
  ],
  "ost": [
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1",
    "rabbit-node-1"
  ]
}
```

Signed-off-by: Blake Devcich <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
@roehrich-hpe roehrich-hpe merged commit 95022f7 into releases/v0 Dec 9, 2024
3 checks passed
@roehrich-hpe roehrich-hpe deleted the release-v0.1.17 branch December 9, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants