Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Topology CRD #325

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

fmount
Copy link
Contributor

@fmount fmount commented Dec 5, 2024

Topology CRD is made by two different structs imported and abstracted from PodSpec [1]:

  1. TopologySpreadConstraint
  2. Affinity/AntiAffinity

The above seems enough to draft a dedicated CR instead of exposing those parameters through the service operators' API. In addition, Affinity/AntiAffinity is wrapped in lib-common and not imported as is from PodSpec.
NodeSelector can be imported here in future.

Depends-On: openstack-k8s-operators/lib-common#587

[1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec

@fmount fmount requested review from olliewalsh, abays and stuggi December 5, 2024 08:32
@openshift-ci openshift-ci bot requested a review from viroel December 5, 2024 08:32
@fmount fmount marked this pull request as draft December 5, 2024 08:32
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/733699b4849940629a1ec927e6f8dbc0

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 43m 19s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 12m 00s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 32m 01s

@fultonj fultonj self-requested a review December 5, 2024 13:14
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.

An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.

controllers/topology/topology_controller.go Outdated Show resolved Hide resolved
controllers/topology/topology_controller.go Outdated Show resolved Hide resolved
apis/topology/v1beta1/topology_types.go Outdated Show resolved Hide resolved
apis/topology/v1beta1/topology_types.go Outdated Show resolved Hide resolved
apis/topology/v1beta1/topology_types.go Outdated Show resolved Hide resolved
@fmount
Copy link
Contributor Author

fmount commented Dec 5, 2024

In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.

An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.

That is an approach we could take in a long term run, but I think today we're not there yet (maybe it will take ~6months to align what we have and not break existing deployments).
Find a solution for this problem is a matter of time, while what you're proposing should be a ctlplane v2 with a revisited design we should agree on.
I don't have a methodical approach to decide if we need a dedicated API or not, but nesting this struct (or part of it, because you might don't need affinity for all the services!) would result in a sizing problem for the openstack-operator.
Being that said I'm ok to revisit this approach if there's no full agreement on it, or we can draft some rules on dev-docs (/cc @dprince). This doesn't change the fact that if we want this feature, a dedicated CR seems the only way.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d6ff072498354c4aa25f05fc905ea216

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 58s
podified-multinode-edpm-deployment-crc POST_FAILURE in 1h 11m 57s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 11m 12s

@fmount fmount force-pushed the topology-1 branch 3 times, most recently from 352fd95 to 7e07d12 Compare December 6, 2024 09:57
@gibizer
Copy link
Contributor

gibizer commented Dec 6, 2024

In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.
An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.

That is an approach we could take in a long term run, but I think today we're not there yet (maybe it will take ~6months to align what we have and not break existing deployments). Find a solution for this problem is a matter of time, while what you're proposing should be a ctlplane v2 with a revisited design we should agree on. I don't have a methodical approach to decide if we need a dedicated API or not, but nesting this struct (or part of it, because you might don't need affinity for all the services!) would result in a sizing problem for the openstack-operator. Being that said I'm ok to revisit this approach if there's no full agreement on it, or we can draft some rules on dev-docs (/cc @dprince). This doesn't change the fact that if we want this feature, a dedicated CR seems the only way.

"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later.

I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well.

I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it.

@fmount
Copy link
Contributor Author

fmount commented Dec 6, 2024

In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.
An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.

That is an approach we could take in a long term run, but I think today we're not there yet (maybe it will take ~6months to align what we have and not break existing deployments). Find a solution for this problem is a matter of time, while what you're proposing should be a ctlplane v2 with a revisited design we should agree on. I don't have a methodical approach to decide if we need a dedicated API or not, but nesting this struct (or part of it, because you might don't need affinity for all the services!) would result in a sizing problem for the openstack-operator. Being that said I'm ok to revisit this approach if there's no full agreement on it, or we can draft some rules on dev-docs (/cc @dprince). This doesn't change the fact that if we want this feature, a dedicated CR seems the only way.

"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later.

I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well.

I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it.

Regardless of the rush for GA (that btw still continue for FR2), I'm wondering if this approach prevents or is in conflict with an eventual review of the ctlplane based on referencing CRs. In this proposal I tried to put openstack-operator and the CRD size problem aside, and see if this solution (as an alternative to always inline structs in the API) can work and is something we can adopt for more advanced configurations.
I can put everything on hold, I have no problem with that, but I don't see a ctlplane v2 as something that will be here soon. The new approach would also require attention to avoid breaking what has already been GAed and it would also require testing and agreement from all the parts involved.
I agree it's not me or you setting priorities and I understand if you're -1 on this proposal.

@gibizer
Copy link
Contributor

gibizer commented Dec 6, 2024

"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later.
I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well.
I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it.

Regardless of the rush for GA (that btw still continue for FR2), I'm wondering if this approach prevents or is in conflict with an eventual review of the ctlplane based on referencing CRs. In this proposal I tried to put openstack-operator and the CRD size problem aside, and see if this solution (as an alternative to always inline structs in the API) can work and is something we can adopt for more advanced configurations. I can put everything on hold, I have no problem with that, but I don't see a ctlplane v2 as something that will be here soon. The new approach would also require attention to avoid breaking what has already been GAed and it would also require testing and agreement from all the parts involved. I agree it's not me or you setting priorities and I understand if you're -1 on this proposal.

I cannot decide if having a topology CRD will conflict with a control plane v2 as I don't have a control plane v2 plan in my head so I cannot judge this move based on that.

What I feel is that if we need to cut then I would cut across clear functional boundaries. And for me the clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD. And if we go with the Sevice CRDs cutout then having a Topology CRD does not make much sense to me. The topology of an openstack service is highly coupled to the CRD representing the given openstack service. So If we need further cuts after the Service CRDs cut out then maybe the next step would be cutting out per openstack service CRDs like NovaAPI from the Nova CRD or maybe decomposing a NovaAPI CRD into a StatefulSet CRD and NovaAPI helper CRD.

Another reason I suggest to do the top level Service CRD cutout from the ControPlance CRD is that the pieces we cut out are already exists and well defined (i.e. Nova CRD). This means we does not need to design and evolve new structs. And upgrading an existing deployment to v2 has a limited impact on the control plane as the existing Service CRs does not need to change, just the ownership of them would change.

In contrast dissolving the Topology CRD in v2 and having contained the info in the Service CRDs seems like a more complicated upgrade path as the Service CRs need to get a new schema and the filling of the new fields needs to come from the Topology CRs.

"I don't see a ctlplane v2 as something that will be here soon."
I'm afraid of exactly this. I we don't create a priority for it by saying we hit a dead end with the current structure and instead keep patching the system to make the next feature fit somehow then v2 will not happen.

"I can put everything on hold,"
Please don't, we need to roll forward with the discussion on how to get out of the state when adding any kind of CRD feature is heavily complicated by the current CRD structure we have.

So yeah I'm -1 on this approach at least until it is discussed further and a reasonable commitment / preliminary plan for v2 is in place. But I guess my -1 can be overwritten by the core team.

@fmount
Copy link
Contributor Author

fmount commented Dec 6, 2024

"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later.
I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well.
I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it.

Regardless of the rush for GA (that btw still continue for FR2), I'm wondering if this approach prevents or is in conflict with an eventual review of the ctlplane based on referencing CRs. In this proposal I tried to put openstack-operator and the CRD size problem aside, and see if this solution (as an alternative to always inline structs in the API) can work and is something we can adopt for more advanced configurations. I can put everything on hold, I have no problem with that, but I don't see a ctlplane v2 as something that will be here soon. The new approach would also require attention to avoid breaking what has already been GAed and it would also require testing and agreement from all the parts involved. I agree it's not me or you setting priorities and I understand if you're -1 on this proposal.

I cannot decide if having a topology CRD will conflict with a control plane v2 as I don't have a control plane v2 plan in my head so I cannot judge this move based on that.

What I feel is that if we need to cut then I would cut across clear functional boundaries. And for me the clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD. And if we go with the Sevice CRDs cutout then having a Topology CRD does not make much sense to me. The topology of an openstack service is highly coupled to the CRD representing the given openstack service. So If we need further cuts after the Service CRDs cut out then maybe the next step would be cutting out per openstack service CRDs like NovaAPI from the Nova CRD or maybe decomposing a NovaAPI CRD into a StatefulSet CRD and NovaAPI helper CRD.

Are you suggesting to split the CRD and include only one of them (e.g. the helper CRD) in the openstack-operator? Or apply the byRef patter entirely on the top-level Nova CR? I think I see your point but we should clarify how the user experience works in this case. If I understand you correctly the top-level CRD has stable structs (especially for core fields) and they are pretty defined and won't change the size of the openstack control plane during the time. What I'm not sure is how the helper CRD would work in that case (I assume you want the user to patch the Nova CR for any change applied to Nova components, right?)

Another reason I suggest to do the top level Service CRD cutout from the ControPlance CRD is that the pieces we cut out are already exists and well defined (i.e. Nova CRD). This means we does not need to design and evolve new structs. And upgrading an existing deployment to v2 has a limited impact on the control plane as the existing Service CRs does not need to change, just the ownership of them would change.

In contrast dissolving the Topology CRD in v2 and having contained the info in the Service CRDs seems like a more complicated upgrade path as the Service CRs need to get a new schema and the filling of the new fields needs to come from the Topology CRs.

I'm not saying this: even in a v2 world no one prevent us to still use the byRef approach against a topology CR, or provide conversion webhooks if the direction is to inline the struct in each service operator. As you can see from this patch, the code involved into this API is not much, and it can be easily translated without too much logic.

"I don't see a ctlplane v2 as something that will be here soon." I'm afraid of exactly this. I we don't create a priority for it by saying we hit a dead end with the current structure and instead keep patching the system to make the next feature fit somehow then v2 will not happen.

"I can put everything on hold," Please don't, we need to roll forward with the discussion on how to get out of the state when adding any kind of CRD feature is heavily complicated by the current CRD structure we have.

I think @dprince (who is going to hate me because of this long thread :D) has already plans for a v2: even if the implementation would require time (fwiw, is something we can help with to speedup the process) - an hypothetical FR3 milestone - with the Topology conversation we already started the process of discussing about a problem we would like to solve soon, otherwise it will follow us for each milestone that requires new features.

So yeah I'm -1 on this approach at least until it is discussed further and a reasonable commitment / preliminary plan for v2 is in place. But I guess my -1 can be overwritten by the core team.

No need to override the -1, I think this conversation represents the most important part of the patch as we are explicitly raising the problem, writing ideas (either in English or by POCs) and look for a solution that won't present code refactoring challenges in the future. I think >1y ago (extraMounts time), with a focus on "TripleO feature parity" I didn't imagine how hard would have been thinking about a v2 because of the limits we're going to hit. Something (at least for me and my lack of knowledge of the k8s world) to retrospect and learn from for the future.

@stuggi
Copy link
Contributor

stuggi commented Dec 9, 2024

iirc in our last thu meeting we discussed to talk about a possible v2 this week. it might help to wait for this meeting to see how this PR could fit?

@fmount
Copy link
Contributor Author

fmount commented Dec 9, 2024

iirc in our last thu meeting we discussed to talk about a possible v2 this week. it might help to wait for this meeting to see how this PR could fit?

I'm ok with that. A potential problem with this patch could be that we start splitting the API into dedicated CRs, but then we find ourselves in a situation where we want to add them back (inline) to the service operators. For this reason, I think we can simply wait and define the direction.
I still think a dedicated API is not such a big problem, and we can use conversion webhooks to consolidate the service operators' CRs, but that represents more work on our plate.
We might end up doing that anyway, but let's have a formal answer before moving forward.

@gibizer
Copy link
Contributor

gibizer commented Dec 9, 2024

I cannot decide if having a topology CRD will conflict with a control plane v2 as I don't have a control plane v2 plan in my head so I cannot judge this move based on that.
What I feel is that if we need to cut then I would cut across clear functional boundaries. And for me the clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD. And if we go with the Sevice CRDs cutout then having a Topology CRD does not make much sense to me. The topology of an openstack service is highly coupled to the CRD representing the given openstack service. So If we need further cuts after the Service CRDs cut out then maybe the next step would be cutting out per openstack service CRDs like NovaAPI from the Nova CRD or maybe decomposing a NovaAPI CRD into a StatefulSet CRD and NovaAPI helper CRD.

Are you suggesting to split the CRD and include only one of them (e.g. the helper CRD) in the openstack-operator? Or apply the byRef patter entirely on the top-level Nova CR? I think I see your point but we should clarify how the user experience works in this case. If I understand you correctly the top-level CRD has stable structs (especially for core fields) and they are pretty defined and won't change the size of the openstack control plane during the time. What I'm not sure is how the helper CRD would work in that case (I assume you want the user to patch the Nova CR for any change applied to Nova components, right?)

I don't have a fully detailed out plan in my head, more like a direction to plan towards. I think the main configuration interface after the cut would be the Nova CR not the OpenStackControlPlane CR. but I acknowledge that there are inputs to coordinate across multiple top level service CRs so we need a solution for that coordination. Either it is top down, and then it means we still need a service Core spec included in OpenStackControlPlane or it is bottom up (like in the case of you proposal) when the service CR controller needs to look up some other CRs to read the common configuration out. I feel (but without more details I cannot be certain) that the bottom up approach is actually leads to a more scaleable design due to less centralization.

Another reason I suggest to do the top level Service CRD cutout from the ControPlance CRD is that the pieces we cut out are already exists and well defined (i.e. Nova CRD). This means we does not need to design and evolve new structs. And upgrading an existing deployment to v2 has a limited impact on the control plane as the existing Service CRs does not need to change, just the ownership of them would change.
In contrast dissolving the Topology CRD in v2 and having contained the info in the Service CRDs seems like a more complicated upgrade path as the Service CRs need to get a new schema and the filling of the new fields needs to come from the Topology CRs.

I'm not saying this: even in a v2 world no one prevent us to still use the byRef approach against a topology CR, or provide conversion webhooks if the direction is to inline the struct in each service operator. As you can see from this patch, the code involved into this API is not much, and it can be easily translated without too much logic.

My point here are twofold. i) I feel that Topology CRD will not be a main cut point in v2. ii) I think we need to be intentional about the cost of inlineing it later. I haven't written any conversion webhook so for me it is hard to judge the cost of it, so I feel this as a risky move. If other did such conversion before then I would be happy see their judgement about the cost of it.

"I don't see a ctlplane v2 as something that will be here soon." I'm afraid of exactly this. I we don't create a priority for it by saying we hit a dead end with the current structure and instead keep patching the system to make the next feature fit somehow then v2 will not happen.
"I can put everything on hold," Please don't, we need to roll forward with the discussion on how to get out of the state when adding any kind of CRD feature is heavily complicated by the current CRD structure we have.

I think @dprince (who is going to hate me because of this long thread :D) has already plans for a v2: even if the implementation would require time (fwiw, is something we can help with to speedup the process) - an hypothetical FR3 milestone - with the Topology conversation we already started the process of discussing about a problem we would like to solve soon, otherwise it will follow us for each milestone that requires new features.

In my eyes it is a long thread, but a valid discussion to have. Maybe we can do the discussion in other format. But personally I like async, written discussion (and I know other might not like this form). I'm happy to hear that the Toplogy issue triggered the overall v2 discussion. I'm looking forward to know more about the planned v2 changes to see the big picture.

So yeah I'm -1 on this approach at least until it is discussed further and a reasonable commitment / preliminary plan for v2 is in place. But I guess my -1 can be overwritten by the core team.

No need to override the -1, I think this conversation represents the most important part of the patch as we are explicitly raising the problem, writing ideas (either in English or by POCs) and look for a solution that won't present code refactoring challenges in the future. I think >1y ago (extraMounts time), with a focus on "TripleO feature parity" I didn't imagine how hard would have been thinking about a v2 because of the limits we're going to hit. Something (at least for me and my lack of knowledge of the k8s world) to retrospect and learn from for the future.

For me k8s stack was new too. But to be clear we saw this CRD limit already a year ago, and so far we only worked around the problem instead of trying to solve it. I'm happy to see that this time we will not just trying to workaround the problem but actually discuss a different structure that potentially playing nicer with the constraints of the tech stack we choose.

tl;dr: My goal here is not just to solve the immediate problem, but look at the big picture and create a solution that works with the given CRD size limits long term. And for that this discussion is helpful as well as looking at the v2 proposal from Dan thihs week.

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/lib-common#587 is needed.

Copy link
Contributor

openshift-ci bot commented Dec 13, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fmount
Once this PR has been reviewed and has the lgtm label, please assign stuggi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/infra-operator for 325,43a3074562126ba601922b919b6e55020280b229

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/infra-operator for 325,7dc80086f57fd387f811cd0cd20752a26bedb42f

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/infra-operator for 325,6526a78aea8f896eb4565fa1005c5d17240805fe

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4bc86cfea9964df1b240db2ef5631623

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 00m 47s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 15s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 39m 43s

Topology CRD is made by two different structs imported and abstracted
from PodSpec [1]:

1. TopologySpreadConstraint
2. Affinity/AntiAffinity

The above seems enough to draft a dedicated CR instead of exposing
those parameters through the service operators' API.
In addition, Affinity/AntiAffinity is imported from PodSpec [1] and
can override the default affinity policy defined in lib-common.

[1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec

Signed-off-by: Francesco Pantano <[email protected]>
@fmount fmount marked this pull request as ready for review January 9, 2025 12:40
@bogdando bogdando self-requested a review January 9, 2025 14:21
Copy link

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Gibi on "clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD". We should postpone this work until we decide on CRDs v2 architecure

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e1cb043367d9440eafe4f115d323d59a

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 40s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 42s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 13m 24s

@fmount
Copy link
Contributor Author

fmount commented Jan 10, 2025

recheck

@fmount
Copy link
Contributor Author

fmount commented Jan 10, 2025

I agree with Gibi on "clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD". We should postpone this work until we decide on CRDs v2 architecure

The conversation related to ctlplane v2, the timeline, any potential change in the way services are grouped in the ctlplane CR, etc, are not part of this patch, which is intended to close the gap for a feature request that must be delivered by FR2.
That conversation already started, and we are collecting "ideas" on a doc. This patch doesn't conflict with that conversation regardless of the direction that will be taken.

@fmount fmount requested a review from bogdando January 10, 2025 12:13
@fmount fmount assigned stuggi and unassigned bogdando Jan 10, 2025
@fmount fmount requested a review from gibizer January 10, 2025 12:13
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/lib-common#587 is needed.

@fmount
Copy link
Contributor Author

fmount commented Jan 10, 2025

recheck

@bogdando
Copy link

bogdando commented Jan 10, 2025

The conversation related to ctlplane v2, the timeline, any potential change in the way services are grouped in the ctlplane CR, etc, are not part of this patch, which is intended to close the gap for a feature request that must be delivered by FR2. That conversation already started, and we are collecting "ideas" on a doc. This patch doesn't conflict with that conversation regardless of the direction that will be taken.

I am still concerned with that Gibi said in that regard:

Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/aa719bfaf9d94371a52bc6547adee839

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 09s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 40s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 13m 42s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants