-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hub online synchronization #82
base: master
Are you sure you want to change the base?
Conversation
b440755
to
d4594dd
Compare
@rjmateus Would it make sense to add also all the other ISSv2 features to this RFC and plan possible replacements? This would give a full picture but we can still do the implementation in steps. It would not prevent us to start with channels first. |
@mcalmer Good point Michael. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lovely, and I appreciate it. I only have a few questions, but aside from that, everything appears satisfactory from my perspective.
Signed-off-by: Ricardo Mateus <[email protected]>
4727131
to
aff2588
Compare
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
We can follow a similar approach to what exists on ISSv1. On the hub side we can define multiple peripheral servers to connect to by providing the FQDN and an authentication token. | ||
On the peripheral side we also need to define the Hub server FQDN and an Authentication token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hub will have peripheral and generate associated auth token, right?
Why do we need to provide peripheral FQDN? Wouldn't generic peripheral name (may be FQDN) and generated token be enough? Or do we approach this as username/pass scenario?
I am assuming that connection will be always from peripheral to the hub, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be used for authentication from HUB to peripheral.
Communication will be bi-directional. Some cases will be peripheral calling hub (like synchronize software channels and calling SCC endpoints) other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)
|
||
## Peripheral software channels creation | ||
|
||
We need a mechanism to create the channels in the peripheral servers (vendor and CLM's) in the desired organization. The peripheral channel creation must be done automatically from the HUB server through an API. Since we are making special channel creation (defined next), those API methods should be available to server-to-server communication only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to define a little bit more in detail, how this server-to-server API should look like.
This should also say something about the exiting API namespaces sync.master
and sync.slave
.
- What namespace should be used for it?
- one namespace or multiple?
- design it with an outlook to the future and what else needs to be added in future to this API. E.g. activation keys, config channels, images, formulas, etc.
- how should the authentication work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Michael. I added some clarification about API namespace and use cases. I didn't add any details about the exact API methods to develop because it looks to me like it's an implementation detail.
Could you have a look if it's more clear now? Thank you
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing one important section and that is failure scenarios:
- what happens when peripheral is to be synced but is unavailable?
Seems like particularly in channel creation and CLM, where sync should be done automatically, this scenario can happen.
Since in this case connection direction is expected to be:
other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)
So if peripheral is unavailable, do we keep track what was updated and what not? And how?
- what happens when peripheral or hub crashes during the sync?
With ISSv2 everything was just one transaction so inconsistencies should not happen.
We should however check ACIDity of our APIs. And not only individual API calls, but sequences of them which we will use for the sync and define expected failure modes.
|
||
An implementation example is available from the community [link](https://github.com/uyuni-project/contrib/blob/main/os-image-tools/osimage-import-export.py). | ||
|
||
Communication can be done from the HUB to the peripheral server to synchronize all the necessary data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to point out not everyone would like to sync from HUB to peripheral.
Case being one of our users where they have different SUMAs for different environments - prod, qa, dev.
They build their images using SUMA in the dev environment, once image pass basic validation it is exported and imported to the qa SUMA using above mentioned script. Here image passes more thorough testing and once pass and maint. window open it is again moved to the prod. This process ensures no further changes to the image is done as the import/export do not modify image in any way.
Centrally managed hub network would help them with ensuring same configuration of the those SUMAs, however they would certainly need an ability to either:
- sync images from the peripheral to another peripheral and to HUB (this can be done outside HUB arch by existing APIs)
- prevent auto-syncing images from peripheral to HUB and/or overwriting peripheral images from HUB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point in here Ondrej.
Let's move by parts, and I will start from the end.
The idea is for users to be able to define in the HUB if they were to synchronize all data or only a select set. This way they could control when an image lands on each peripheral server.
HUB server could also have the ability to build images. It will have all the channels, so they create a build host assigned to the HUB server.
Considering those two assumptions would make sense to build the image on HUB server and then transfer that to dev preipheral server and make all the necessary tests. After all dev test where made, we transfer the new image version to other environments (qua, prod) and make it available to all.
If this is not the case, then we can always use the script you mentioned or ISSv2, since it will stay around.
The goal for this solution is scalability only, but other use cases will stay around and may need different implementation and components.
Signed-off-by: Ricardo Mateus <[email protected]>
@aaannz I added a section about failing scenarios. Do you think is clear enough or should it be more completed? |
Remember that a lazy repo-sync has been started. This may have an impact on the design. |
The design of this RFC should not have any impact. I would say is the other way around, the existence of this RFC and the possibility of having a chained SUSE manager server, that may not have the package synchronized is something that will impact the new reposync. |
[alternatives]: #alternatives | ||
|
||
- Create a new UI and use ISSv2 to synchronize data | ||
- Solve the existing and known problems of ISSv2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not discard this point, in addition to implement ISSv3.
As ISSv2 it's going to be used for disconnected environments, we can still bring a better user experience to that use case. For example, can we consider some improvement around performing parallel sql queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of performance, the main issues are in the import process. In there we cannot change it to run in parallel.
We can however change the way we do the transaction, and have one transaction per channel, instead of one transaction per export as we have now.
Another possibility is to not have a transition at all, but that can be risky if users start to use the channel during the import process, or if an error occurs during the import.
- All peripheral can start synchronizing at same time | ||
- Can be problematic if we have several peripherals performing full synchronization at same time. However we can configure the peripherals to run repo-sync in different hours and spread the load | ||
- This is only a problem in first sync, since subsequent sync only transfer differences | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When synchronizing the channels from SCC service, in theory, we rely on a service with HA.
In that proposal, the peripherals will rely in a unique Hub instance through custom and vendor channels pointing to that machine, what happen if that goes down or.. the Hub disk burns? I would also consider how we can recover under these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We rely on the HUB, but the repo-sync tool already has a retry mechanism. It will try to download/synch the content in the next iteration of mgr-sync taskomatic task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srbarrios is this stilla question for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review iteration of alternative 2
|
||
On the peripheral side we call the API on the Hub and fill out the required configuration data needed locally on the peripheral server (e.g. auto generated mirror credential password). | ||
We will also change the `scc_url` configuration on the Peripheral to point to the Hub Server. | ||
We need to establish a secure connection between Hub and Peripheral Server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hub_xmlrpc_api is for ISSv2. In worst case the user can stay in it.
I cannot say if this is compatible with ISSv3. I do not really know what this is doing.
@rjmateus might be able to answer this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hub_xmlrpc_api will stay as is, since it's note dependent on the ISS version in use. This means it can work with any version of ISS, even the v3.
However, we may need to make some changes to one API that is providing the hub servers FQDN to the xml-rpc API, to instead of looking at the system entitlement it returns a list of configured peripherals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what about all different authentication methods https://documentation.suse.com/suma/5.0/en/suse-manager/specialized-guides/large-deployments/hub-auth.html ?
To support them, we will leave it up to the user to create the appropriate API users, ensuring that those users can be utilized to correctly make the API calls later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. Users must exists in the peripheral servers to be able to call the API.
In 4.3 this config was possible with a salt formula, wich in 5.0 was transformed to a API call: https://github.com/SUSE/spacewalk/issues/22498
HOwever, I was trying to find the documentation for this and was unable. I ping Cedric and Vladimir to check if it's missing or if I'm blind.
This is a RFC to improve content synchronization for HUB scenarios. It also relates to Inter Server Synchronization.
Rendered version