-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Topology CRD #325
base: main
Are you sure you want to change the base?
Add Topology CRD #325
Conversation
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/733699b4849940629a1ec927e6f8dbc0 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 43m 19s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.
An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.
That is an approach we could take in a long term run, but I think today we're not there yet (maybe it will take ~6months to align what we have and not break existing deployments). |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d6ff072498354c4aa25f05fc905ea216 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 58s |
352fd95
to
7e07d12
Compare
"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later. I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well. I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it. |
Regardless of the rush for GA (that btw still continue for FR2), I'm wondering if this approach prevents or is in conflict with an eventual review of the ctlplane based on referencing CRs. In this proposal I tried to put |
I cannot decide if having a topology CRD will conflict with a control plane v2 as I don't have a control plane v2 plan in my head so I cannot judge this move based on that. What I feel is that if we need to cut then I would cut across clear functional boundaries. And for me the clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD. And if we go with the Sevice CRDs cutout then having a Topology CRD does not make much sense to me. The topology of an openstack service is highly coupled to the CRD representing the given openstack service. So If we need further cuts after the Service CRDs cut out then maybe the next step would be cutting out per openstack service CRDs like NovaAPI from the Nova CRD or maybe decomposing a NovaAPI CRD into a StatefulSet CRD and NovaAPI helper CRD. Another reason I suggest to do the top level Service CRD cutout from the ControPlance CRD is that the pieces we cut out are already exists and well defined (i.e. Nova CRD). This means we does not need to design and evolve new structs. And upgrading an existing deployment to v2 has a limited impact on the control plane as the existing Service CRs does not need to change, just the ownership of them would change. In contrast dissolving the Topology CRD in v2 and having contained the info in the Service CRDs seems like a more complicated upgrade path as the Service CRs need to get a new schema and the filling of the new fields needs to come from the Topology CRs. "I don't see a ctlplane v2 as something that will be here soon." "I can put everything on hold," So yeah I'm -1 on this approach at least until it is discussed further and a reasonable commitment / preliminary plan for v2 is in place. But I guess my -1 can be overwritten by the core team. |
Are you suggesting to split the CRD and include only one of them (e.g. the helper CRD) in the openstack-operator? Or apply the byRef patter entirely on the top-level Nova CR? I think I see your point but we should clarify how the user experience works in this case. If I understand you correctly the top-level CRD has stable structs (especially for core fields) and they are pretty defined and won't change the size of the openstack control plane during the time. What I'm not sure is how the helper CRD would work in that case (I assume you want the user to patch the Nova CR for any change applied to Nova components, right?)
I'm not saying this: even in a v2 world no one prevent us to still use the
I think @dprince (who is going to hate me because of this long thread :D) has already plans for a v2: even if the implementation would require time (fwiw, is something we can help with to speedup the process) - an hypothetical FR3 milestone - with the
No need to override the -1, I think this conversation represents the most important part of the patch as we are explicitly raising the problem, writing ideas (either in English or by POCs) and look for a solution that won't present code refactoring challenges in the future. I think >1y ago ( |
iirc in our last thu meeting we discussed to talk about a possible v2 this week. it might help to wait for this meeting to see how this PR could fit? |
I'm ok with that. A potential problem with this patch could be that we start splitting the |
I don't have a fully detailed out plan in my head, more like a direction to plan towards. I think the main configuration interface after the cut would be the Nova CR not the OpenStackControlPlane CR. but I acknowledge that there are inputs to coordinate across multiple top level service CRs so we need a solution for that coordination. Either it is top down, and then it means we still need a service Core spec included in OpenStackControlPlane or it is bottom up (like in the case of you proposal) when the service CR controller needs to look up some other CRs to read the common configuration out. I feel (but without more details I cannot be certain) that the bottom up approach is actually leads to a more scaleable design due to less centralization.
My point here are twofold. i) I feel that Topology CRD will not be a main cut point in v2. ii) I think we need to be intentional about the cost of inlineing it later. I haven't written any conversion webhook so for me it is hard to judge the cost of it, so I feel this as a risky move. If other did such conversion before then I would be happy see their judgement about the cost of it.
In my eyes it is a long thread, but a valid discussion to have. Maybe we can do the discussion in other format. But personally I like async, written discussion (and I know other might not like this form). I'm happy to hear that the Toplogy issue triggered the overall v2 discussion. I'm looking forward to know more about the planned v2 changes to see the big picture.
For me k8s stack was new too. But to be clear we saw this CRD limit already a year ago, and so far we only worked around the problem instead of trying to solve it. I'm happy to see that this time we will not just trying to workaround the problem but actually discuss a different structure that potentially playing nicer with the constraints of the tech stack we choose. tl;dr: My goal here is not just to solve the immediate problem, but look at the big picture and create a solution that works with the given CRD size limits long term. And for that this discussion is helpful as well as looking at the v2 proposal from Dan thihs week. |
This change depends on a change that failed to merge. Change openstack-k8s-operators/lib-common#587 is needed. |
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
ded3572
to
eea0aa5
Compare
490529c
to
0c21352
Compare
@bogdando Please don't resurrect the general API discussions here, it's really not the right place for it (also from 1 month ago) |
This is the right place as this change introduces another "cutting point" additional to Service operators CRDs, as @gibizer explained, that brings more technical dept and complications to upgrading of "today" API to the "future" reworked API. You can always overcome the raised concern and do what you think is right, folks. |
@olliewalsh @stuggi I think we're ready here. We might need to follow up in case we require additional shared code, but overall I think we can move forward with this and focus on the services side (glance-operator patch atm), to converge to the wanted behavior. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/77bd63682db24527b6b16954fbb93fc5 ❌ openstack-k8s-operators-content-provider FAILURE in 7m 36s |
recheck |
ecb0e4a
to
900dd87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one comment regarding the Status
Topology CRD is made by two different structs imported and abstracted from PodSpec [1]: 1. TopologySpreadConstraint 2. Affinity The above seems enough to draft a dedicated CR instead of exposing those parameters through the service operators' API. In addition, Affinity/AntiAffinity is imported from PodSpec [1] and can override the default affinity policy defined in lib-common through DistributePods function. [1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec Signed-off-by: Francesco Pantano <fpantano@redhat.com>
/test functional |
Topology CRD is made by two different structs imported and abstracted from PodSpec [1]:
The above seems enough to draft a dedicated CR instead of exposing (or inline) those parameters in the service operators' API.
NodeSelector
can be imported here in future as a follow up to avoid refactoring any existing code.[1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec