Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configure flavors in Nova automatically #499

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

skrobul
Copy link
Collaborator

@skrobul skrobul commented Nov 19, 2024

This PR brings a little operator style application that monitors a directory with YAML specifications of the machine flavors. If any of the files in the linked repository changes, the application parses all the specs in a directory, then compares and reconciles them with the configured Nova instance.

Please note that this will NOT delete (prune) flavors that have been removed from the repository.

Closes https://rackspace.atlassian.net/browse/PUC-598
Supersedes #454

Merge with https://github.com/RSS-Engineering/undercloud-deploy/pull/224

@skrobul skrobul force-pushed the nova-flavor-monitor branch 9 times, most recently from 72c0159 to 87d585e Compare November 20, 2024 16:15
@skrobul skrobul changed the title WIP: Nova flavor monitor feat: configure flavors in Nova automatically Nov 20, 2024
@skrobul skrobul marked this pull request as ready for review November 20, 2024 16:17
@skrobul skrobul requested a review from a team November 20, 2024 16:26
Copy link
Contributor

@cardoe cardoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wondering why not just using ansible and having a playbook for managing a flavor and it takes in the flavor spec as an input?

Comment on lines 22 to 25
# create 'flavorsync' user to allow synchronization of the flavors to nova
openstack user create --or-show --domain service --password abcd1234 flavorsync
openstack role create --or-show --domain service flavorsync
openstack role add --user-domain service --user flavorsync flavorsync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really shouldn't be creating random users with random passwords. We should be creating application credentials for our service account.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is just an example that is supposed to be overriden in the deploy repository (just as it is for the argoworkflow and monitoring users above and below.

Comment on lines +2 to +10

# This section needs to be repeated in child images
ARG APP_PATH=/app
ARG APP_USER=appuser
ARG APP_GROUP=appgroup
ARG APP_USER_UID=1000
ARG APP_GROUP_GID=1000


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just feels really heavy in every container. Can we not just do:

        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true

for the pod?

if we really wanted to be slim we can use the distroless. https://github.com/GoogleContainerTools/distroless/blob/main/examples/python3/Dockerfile

Copy link
Collaborator Author

@skrobul skrobul Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just feels really heavy in every container.

If the verbosity is the problem, we can remove some of these by hardcoding the values in the commands. However having them as arguments is no "Shotgun Surgery" when the UID needs to be adjusted or path changed.

Can we not just do:

          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true

for the pod?

Configuring securityContext in Kubernetes and setting user/group IDs in a Dockerfile are two complementary, but not identical things.

If we don't declare USER in a Dockerfile, the container will default to uid=0 / root. Now, for example, if you configured a Pod or container securityContext with runAsNonRoot: true and used such an image, the kubelet will simply refuse to start it.

In theory, you can extend your securityContext with runAsUser: 1001 and the container will start, but that brings another set of problems:

  • Application files inside the container are still owned by root, because you have not configured the USER during the build. This can be especially problematic for files that need to be writable (pids, logs, caches, etc.).
  • Processes running inside the container run with the EUID of a user that does not exist in the system from their point of view.
  • Files owned by the specified UID will not have a corresponding username displayed, causing confusion during debugging or file inspection.
  • Commands like whoami and tools relying on user information might fail or return errors, leading to broken workflows or logging issues.
  • Logs may show only numeric UIDs instead of meaningful usernames, complicating troubleshooting and security audits.
  • If the same image is started in development on a developer's laptop, it will start as root, which could lead to a false sense of things working until they are pushed to Kubernetes.

if we really wanted to be slim we can use the distroless. https://github.com/GoogleContainerTools/distroless/blob/main/examples/python3/Dockerfile

The distroless are generally larger than Alpine images. python:3.12.2-alpine3.19 is 51.8MB and gcr.io/distroless/python3 is 52.8MB. The Alpine one includes at least some of the debugging tools and have a shell. Having said that, we can certainly look into this further and maybe use the distroless ones for production as theoretically they would provide better security.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think I'd be more okay with it if we pinned ourselves to something rather than making it variable and having to carry that around. Because then we'd do the security piece and the runAsUser: 1001.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to work on that change, but it will not be part of this PR as it requires changes to multiple Dockerfiles that are not related to this feature

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #515 - if it gets merged before this one, I'll adjust the code as necessary

@skrobul
Copy link
Collaborator Author

skrobul commented Nov 21, 2024

I'm still wondering why not just using ansible and having a playbook for managing a flavor and it takes in the flavor spec as an input?

OK, that's fair question - let me break this down. At the moment the Flavor specifications are presented as collection of YAML files in a specific structured format.
For each of those files, we have to execute some business logic in order to make them usable. Some of those tasks are:

  • parsing the input and validating that given flavor is defined correctly
  • filtering out the records that are not pertinent to a given environment
  • for the data itself, there are various unit conversions needed. For example, the specs have the memory size defined in gigabytes, but Nova expect them to be defined in mebibytes and the BMC/Redfish on the baremetal node reports them as megabytes.
  • for each of the flavors, the name will be different depending on the context - for example specification will say nonprod.gp2.small but in Nova this needs to have a resource class of resources:CUSTOM_BAREMETAL_GP2SMALL and in Ironic, this has to be baremetal.gp2small.
  • since Nova API does not support 'update' operation (unless you are updating just the description), to update flavors they need to be deleted and recreated. It's good to do that only when needed.

Now, you could of course implement all this logic inside of Ansible (which really is just a YAML wrapper around Python), but I am pretty sure it will get messy and untestable pretty quickly. You could write a custom module for Ansible, which would again be just another Python script.

Alternatively, you could manually write out playbooks with some of this data precomputed. However, this approach would result in duplicated data in various formats, which would make it more error-prone and require more work to maintain.

For the sake of argument, let's say we decided to use Ansible for all of this and you have a role that sets up flavors in Nova. Now you need to make sure that you run this playbook every time someone adds or removes a new flavor. As far as I know, we do not have the machinery to do that at the moment, so it would probably require additional dependencies like Argo to detect repository changes and yet another container image to execute the Ansible.

On top of all that, the testability of such solution would be difficult to achieve and quite convoluted.

Based on these considerations, I believe implementing this in a programming language like Python is a more suitable approach than using a markup language.

This reverts commit 5ac057b which was
added in order to resolve the mismatch between the nova and ironic
flavors that had a punctuation in the name. The reverted commit
incorrectly changed the resource class name on the Ironic side while it
was the  Nova side that was missing the underscores. This was resolved
in the commit before this one.
skrobul added a commit that referenced this pull request Nov 25, 2024
Based on the discussion in #499 (comment) this commit removes the repetitive directives to set the application UIDs, GIDs as well as the name and home directory.
The new, standardised way is to use:
- user with name appuser and UID 1000
- group with name appgroup and GID 1000
- location of the code in /app
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants