-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-5249] Add integration support for hacluster #177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in locally built charm. LGTM.
Please do not forget to create a documentation page. thank you!
options: | ||
|
||
vip: | ||
description: | | ||
Virtual IP to use to front mysql router units. Used only in case of external node connection. | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @delgod and @7annaba3l , we are syncing the option with pgbouncer (which got it from legacy pgbouncer). Are you OK for common UX here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taurus-forever : Yes. Generally speaking we should always favor common UX between PostgreSQL and MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments, mostly cosmetics. Solid tests
src/relations/database_provides.py
Outdated
logger.debug( | ||
f"Set databag {self._id=} {self._database=}, {username=}, {router_read_write_endpoint=}, {router_read_only_endpoint=}" | ||
) | ||
if username and password: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I understand this to allow partial updates. why not just pass all values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we update the endpoints (after the first time when the user/pass + endpoints are initially set), the only values changing are the endpoints (we do not have access to username or password). updating the username/password with None will wipe clean the values in the databag here
return ops.BlockedStatus("vip configuration without data-integrator") | ||
|
||
if self.charm.is_workload_authenticated and self.charm.unit.is_leader() and vip: | ||
return ops.ActiveStatus(f"VIP: {vip}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's status messaging is being handled overall on router? Should just be blank when active, given vip can be queried from config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to blank when active
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are setting the VIP in the leader's status message to stay consistent with the pgbouncer implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we generally are supposed to not display a status message if everything is healthy/no action needed from user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will follow up with @dragomirp about the error message, but still leaning towards keeping as is for the moment as pgbouncer has been released to prod with this behavior and we dont want a different experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we require integration with hacluster if external connectivity is requested?
if not, should the requesting unit specify whether hacluster virtual ip is desired?
wondering if we can simplify code by aligning with the k8s approach where when we create the database user we know what endpoint to expose
return ops.BlockedStatus("vip configuration without data-integrator") | ||
|
||
if self.charm.is_workload_authenticated and self.charm.unit.is_leader() and vip: | ||
return ops.ActiveStatus(f"VIP: {vip}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to blank when active
src/relations/hacluster.py
Outdated
def _is_clustered(self) -> bool: | ||
for key, value in self.relation.data.items(): | ||
if isinstance(key, ops.Unit) and key != self.charm.unit: | ||
if value.get("clustered") in ("yes", "true"): | ||
return True | ||
break | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the underlying hacluster application needs at least 3 nodes/units to be active. this helper is checking if hacluster is clustered:
- relation-id: 12
endpoint: ha
related-endpoint: ha
application-data: {}
related-units:
hacluster/0:
in-scope: true
data:
clustered: "yes"
egress-subnets: 10.205.193.104/32
ingress-address: 10.205.193.104
private-address: 10.205.193.104
hacluster/1:
in-scope: true
data:
clustered: "yes"
egress-subnets: 10.205.193.134/32
ingress-address: 10.205.193.134
private-address: 10.205.193.134
hacluster/3:
in-scope: true
data:
clustered: "yes"
egress-subnets: 10.205.193.179/32
ingress-address: 10.205.193.179
private-address: 10.205.193.179
hacluster/4:
in-scope: true
data:
clustered: "yes"
egress-subnets: 10.205.193.206/32
ingress-address: 10.205.193.206
private-address: 10.205.193.206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this helper is checking if hacluster is clustered
and what does it mean for hacluster to be clustered or not clustered?
also, when would one unit be clustered and other units not be clustered? if we scale up hacluster/the principal and hacluster's new unit databag is not immediately populated should we stop using hacluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hacluster is clustered when there are 3 or more nodes in pacemaker/corosync. there cannot be a cluster with one unit, and if this is the case, we are not asking hacluster to set up the VIP in any of the charm's nodes. hacluster also does not clean up resources when the relation between the principal and subordinate charm is removed
also, we are not really "using" hacluster, just requesting that hacluster create and manage some resources (VIP) for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is checking "clustered" == "yes" a roundabout way for checking for 3+ units?
my concern with how we're currently checking this (checking all unit's databags) is that we could incorrectly not use the virtual ip when scaling up hacluster/the principal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh oops, sorry misread
we're just checking that at least one unit set this
looks like we're only checking one unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in reality it is indeed checking if there are 3+ units, but it is more abstracted than that. if tomorrow, hacluster charm redefines how it is clustered, it can freely do so. all it needs to do is write to the databag of one unit that clustered: yes/true
in 9649f9e, updated the implementation to accept clustered: yes
from any related unit
return ops.ActiveStatus(f"VIP: {vip}") | ||
|
||
def set_vip(self, vip: Optional[str]) -> None: | ||
"""Adds the requested virtual IP to the integration.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the relation interface or databag structure documented anywhere? if so, would suggest dropping a link in a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no documentation of the databag structure. from what i gather, the databag structure was ascertained by closely inspecting/tracing the hacluster charm code (reactive framework)
|
||
if vip: | ||
# TODO Add nic support | ||
ipaddr = ip_address(vip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add any validation on the config input & show status if malformed?
also, are there security issues with string injection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the validation story is even more complex - currently we require that provided IP address is in the same subnet as the machines of the model (otherwise pacemaker is unable to create/manage the VIP and it borks the integration with hacluster)
i understand that the concern is security related - the library that converts the VIP into an ip address already has basic validation:
>>> from ipaddress import ip_address
>>> ip_address('a.b.c.d')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'a.b.c.d' does not appear to be an IPv4 or IPv6 address
src/relations/hacluster.py
Outdated
return | ||
|
||
if vip: | ||
# TODO Add nic support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what does nic support refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely confident, as this was inherited from a library on the pgbouncer charm. i would guess that it refers to adding support for a VIP being attached to a certain NIC on the machine where router is deployed. hoping that @dragomirp can confirm
@@ -46,6 +46,10 @@ requires: | |||
interface: tracing | |||
optional: true | |||
limit: 1 | |||
ha: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider renaming the endpoint to be a bit more descriptive? worried that users might think this relation is necessary for any HA, which is not accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are mimic pgbouncer naming: https://charmhub.io/pgbouncer/integrations#ha
Not perfect but at least identical. CC: @7annaba3l please confirm/propose a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can change name of both PostgreSQL and MySQL without breaking anyone's code (so basically if the above change, in PostgreSQL, was not released to stable) then we can rename it to "external_clustering".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pgbouncer was already released to stable with this relation name. for now, will keep this consistent with pgbouncer
src/machine_charm.py
Outdated
self.framework.observe( | ||
self.on[relations.hacluster.HACLUSTER_RELATION_NAME].relation_changed, self.reconcile | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency with other relations, would suggest moving this to relations/hacluster.py
(e.g. see RelationEndpoint example in database_provides or database_requires)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 9649f9e
src/relations/hacluster.py
Outdated
def get_unit_juju_status(self) -> ops.StatusBase: | ||
"""Returns the status of the hacluster if relation exists.""" | ||
if self.relation and not self.charm.is_externally_accessible(event=None): | ||
return ops.BlockedStatus("ha integration used without data-integrator") | ||
|
||
vip = self.charm.config.get("vip") | ||
if self.relation and not vip: | ||
return ops.BlockedStatus("ha integration used without vip configuration") | ||
|
||
if vip and not self.charm.is_externally_accessible(event=None): | ||
return ops.BlockedStatus("vip configuration without data-integrator") | ||
|
||
if ( | ||
isinstance(self.charm.get_workload(event=None), workload.AuthenticatedWorkload) | ||
and self.charm.unit.is_leader() | ||
and vip | ||
): | ||
return ops.ActiveStatus(f"VIP: {vip}") | ||
|
||
def set_vip(self, vip: Optional[str]) -> None: | ||
"""Adds the requested virtual IP to the integration.""" | ||
if not self.relation: | ||
return | ||
|
||
if not self._is_clustered(): | ||
logger.debug("early exit set_vip: ha relation not yet clustered") | ||
return | ||
|
||
if vip: | ||
ipaddr = ip_address(vip) | ||
vip_key = f"res_{self.charm.app.name}_{shake_128(vip.encode()).hexdigest(7)}_vip" | ||
vip_params = " params" | ||
if isinstance(ipaddr, IPv4Address): | ||
vip_resources = "ocf:heartbeat:IPaddr2" | ||
vip_params += f' ip="{vip}"' | ||
else: | ||
vip_resources = "ocf:heartbeat:IPv6addr" | ||
vip_params += f' ipv6addr="{vip}"' | ||
|
||
# Monitor the VIP | ||
vip_params += ' meta migration-threshold="INFINITY" failure-timeout="5s"' | ||
vip_params += ' op monitor timeout="20s" interval="10s" depth="0"' | ||
json_resources = json.dumps({vip_key: vip_resources}) | ||
json_resource_params = json.dumps({vip_key: vip_params}) | ||
|
||
else: | ||
json_resources = "{}" | ||
json_resource_params = "{}" | ||
|
||
self.relation.data[self.charm.unit].update( | ||
{ | ||
"json_resources": json_resources, | ||
"json_resource_params": json_resource_params, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider merging the logic in these two functions together, similar to ConnectionInformation
in database_requires & exposing as little as needed (instead of exposing .relation and the config value)
I think this will help with the current issue with not checking is_clustered when determining the endpoint
src/machine_charm.py
Outdated
if self._ha_cluster.relation and self.config.get("vip"): | ||
return self.config["vip"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if is_clustered is false but the relation and config value exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch! added a check for is_clustered()
in 9649f9e
however, will sync with @dragomirp about a more unified approach to reporting the VIP to the client
src/relations/hacluster.py
Outdated
def _is_clustered(self) -> bool: | ||
for key, value in self.relation.data.items(): | ||
if isinstance(key, ops.Unit) and key != self.charm.unit: | ||
if value.get("clustered") in ("yes", "true"): | ||
return True | ||
break | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this helper is checking if hacluster is clustered
and what does it mean for hacluster to be clustered or not clustered?
also, when would one unit be clustered and other units not be clustered? if we scale up hacluster/the principal and hacluster's new unit databag is not immediately populated should we stop using hacluster?
Merging to move the initiative forward. Will sync with Dragomir to see what may/can be done to improve UX |
Issue
We need to add integration support for the hacluster charm. This will allow the mysqlrouter charm to be exposed via data-integrator through a provided virtual IP
Solution
Add integration support