Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-5249] Add integration support for hacluster #177

Merged
merged 13 commits into from
Sep 23, 2024

Conversation

shayancanonical
Copy link
Contributor

Issue

We need to add integration support for the hacluster charm. This will allow the mysqlrouter charm to be exposed via data-integrator through a provided virtual IP

Solution

Add integration support

metadata.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in locally built charm. LGTM.
Please do not forget to create a documentation page. thank you!

Comment on lines +4 to +9
options:

vip:
description: |
Virtual IP to use to front mysql router units. Used only in case of external node connection.
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @delgod and @7annaba3l , we are syncing the option with pgbouncer (which got it from legacy pgbouncer). Are you OK for common UX here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taurus-forever : Yes. Generally speaking we should always favor common UX between PostgreSQL and MySQL.

Copy link
Contributor

@paulomach paulomach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments, mostly cosmetics. Solid tests

logger.debug(
f"Set databag {self._id=} {self._database=}, {username=}, {router_read_write_endpoint=}, {router_read_only_endpoint=}"
)
if username and password:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I understand this to allow partial updates. why not just pass all values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we update the endpoints (after the first time when the user/pass + endpoints are initially set), the only values changing are the endpoints (we do not have access to username or password). updating the username/password with None will wipe clean the values in the databag here

src/relations/hacluster.py Outdated Show resolved Hide resolved
src/relations/hacluster.py Outdated Show resolved Hide resolved
return ops.BlockedStatus("vip configuration without data-integrator")

if self.charm.is_workload_authenticated and self.charm.unit.is_leader() and vip:
return ops.ActiveStatus(f"VIP: {vip}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's status messaging is being handled overall on router? Should just be blank when active, given vip can be queried from config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to blank when active

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are setting the VIP in the leader's status message to stay consistent with the pgbouncer implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we generally are supposed to not display a status message if everything is healthy/no action needed from user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will follow up with @dragomirp about the error message, but still leaning towards keeping as is for the moment as pgbouncer has been released to prod with this behavior and we dont want a different experience

Copy link
Contributor

@carlcsaposs-canonical carlcsaposs-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we require integration with hacluster if external connectivity is requested?

if not, should the requesting unit specify whether hacluster virtual ip is desired?

wondering if we can simplify code by aligning with the k8s approach where when we create the database user we know what endpoint to expose

src/relations/hacluster.py Outdated Show resolved Hide resolved
return ops.BlockedStatus("vip configuration without data-integrator")

if self.charm.is_workload_authenticated and self.charm.unit.is_leader() and vip:
return ops.ActiveStatus(f"VIP: {vip}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to blank when active

Comment on lines 36 to 42
def _is_clustered(self) -> bool:
for key, value in self.relation.data.items():
if isinstance(key, ops.Unit) and key != self.charm.unit:
if value.get("clustered") in ("yes", "true"):
return True
break
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the underlying hacluster application needs at least 3 nodes/units to be active. this helper is checking if hacluster is clustered:

  - relation-id: 12                                                                                                                                                                                         
    endpoint: ha                                                                                                                                                                                            
    related-endpoint: ha                                                                                                                                                                                    
    application-data: {}                                                                                                                                                                                    
    related-units:                                                                                                                                                                                          
      hacluster/0:                                                                                                                                                                                          
        in-scope: true                                                                                                                                                                                      
        data:                                                                                                                                                                                               
          clustered: "yes"                                                                                                                                                                                  
          egress-subnets: 10.205.193.104/32                                                                                                                                                                 
          ingress-address: 10.205.193.104                                                                                                                                                                   
          private-address: 10.205.193.104                                                                                                                                                                   
      hacluster/1:                                                                                                                                                                                          
        in-scope: true                                                                                                                                                                                      
        data:                                                                                                                                                                                               
          clustered: "yes"                                                                                                                                                                                  
          egress-subnets: 10.205.193.134/32                                                                                                                                                                 
          ingress-address: 10.205.193.134                                                                                                                                                                   
          private-address: 10.205.193.134                                                                                                                                                                   
      hacluster/3:                                                                                                                                                                                          
        in-scope: true                                                                                                                                                                                      
        data:                                                                                                                                                                                               
          clustered: "yes"                                                                                                                                                                                  
          egress-subnets: 10.205.193.179/32                                                                                                                                                                 
          ingress-address: 10.205.193.179                                                                                                                                                                   
          private-address: 10.205.193.179                                                          
      hacluster/4:                                                                                 
        in-scope: true                                                                             
        data:                                                                                      
          clustered: "yes"                                                                         
          egress-subnets: 10.205.193.206/32                                                        
          ingress-address: 10.205.193.206                                                          
          private-address: 10.205.193.206

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper is checking if hacluster is clustered

and what does it mean for hacluster to be clustered or not clustered?

also, when would one unit be clustered and other units not be clustered? if we scale up hacluster/the principal and hacluster's new unit databag is not immediately populated should we stop using hacluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hacluster is clustered when there are 3 or more nodes in pacemaker/corosync. there cannot be a cluster with one unit, and if this is the case, we are not asking hacluster to set up the VIP in any of the charm's nodes. hacluster also does not clean up resources when the relation between the principal and subordinate charm is removed

also, we are not really "using" hacluster, just requesting that hacluster create and manage some resources (VIP) for us

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is checking "clustered" == "yes" a roundabout way for checking for 3+ units?

my concern with how we're currently checking this (checking all unit's databags) is that we could incorrectly not use the virtual ip when scaling up hacluster/the principal

Copy link
Contributor

@carlcsaposs-canonical carlcsaposs-canonical Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oops, sorry misread

we're just checking that at least one unit set this

looks like we're only checking one unit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in reality it is indeed checking if there are 3+ units, but it is more abstracted than that. if tomorrow, hacluster charm redefines how it is clustered, it can freely do so. all it needs to do is write to the databag of one unit that clustered: yes/true

in 9649f9e, updated the implementation to accept clustered: yes from any related unit

return ops.ActiveStatus(f"VIP: {vip}")

def set_vip(self, vip: Optional[str]) -> None:
"""Adds the requested virtual IP to the integration."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the relation interface or databag structure documented anywhere? if so, would suggest dropping a link in a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no documentation of the databag structure. from what i gather, the databag structure was ascertained by closely inspecting/tracing the hacluster charm code (reactive framework)

src/relations/database_provides.py Outdated Show resolved Hide resolved

if vip:
# TODO Add nic support
ipaddr = ip_address(vip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add any validation on the config input & show status if malformed?
also, are there security issues with string injection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the validation story is even more complex - currently we require that provided IP address is in the same subnet as the machines of the model (otherwise pacemaker is unable to create/manage the VIP and it borks the integration with hacluster)

i understand that the concern is security related - the library that converts the VIP into an ip address already has basic validation:

>>> from ipaddress import ip_address
>>> ip_address('a.b.c.d')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'a.b.c.d' does not appear to be an IPv4 or IPv6 address

return

if vip:
# TODO Add nic support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what does nic support refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely confident, as this was inherited from a library on the pgbouncer charm. i would guess that it refers to adding support for a VIP being attached to a certain NIC on the machine where router is deployed. hoping that @dragomirp can confirm

src/abstract_charm.py Outdated Show resolved Hide resolved
@@ -46,6 +46,10 @@ requires:
interface: tracing
optional: true
limit: 1
ha:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider renaming the endpoint to be a bit more descriptive? worried that users might think this relation is necessary for any HA, which is not accurate

cc @taurus-forever

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mimic pgbouncer naming: https://charmhub.io/pgbouncer/integrations#ha
Not perfect but at least identical. CC: @7annaba3l please confirm/propose a better name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can change name of both PostgreSQL and MySQL without breaking anyone's code (so basically if the above change, in PostgreSQL, was not released to stable) then we can rename it to "external_clustering".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pgbouncer was already released to stable with this relation name. for now, will keep this consistent with pgbouncer

Comment on lines 62 to 64
self.framework.observe(
self.on[relations.hacluster.HACLUSTER_RELATION_NAME].relation_changed, self.reconcile
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with other relations, would suggest moving this to relations/hacluster.py

(e.g. see RelationEndpoint example in database_provides or database_requires)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 9649f9e

src/relations/database_provides.py Show resolved Hide resolved
src/relations/database_provides.py Outdated Show resolved Hide resolved
src/relations/database_provides.py Outdated Show resolved Hide resolved
Comment on lines 42 to 96
def get_unit_juju_status(self) -> ops.StatusBase:
"""Returns the status of the hacluster if relation exists."""
if self.relation and not self.charm.is_externally_accessible(event=None):
return ops.BlockedStatus("ha integration used without data-integrator")

vip = self.charm.config.get("vip")
if self.relation and not vip:
return ops.BlockedStatus("ha integration used without vip configuration")

if vip and not self.charm.is_externally_accessible(event=None):
return ops.BlockedStatus("vip configuration without data-integrator")

if (
isinstance(self.charm.get_workload(event=None), workload.AuthenticatedWorkload)
and self.charm.unit.is_leader()
and vip
):
return ops.ActiveStatus(f"VIP: {vip}")

def set_vip(self, vip: Optional[str]) -> None:
"""Adds the requested virtual IP to the integration."""
if not self.relation:
return

if not self._is_clustered():
logger.debug("early exit set_vip: ha relation not yet clustered")
return

if vip:
ipaddr = ip_address(vip)
vip_key = f"res_{self.charm.app.name}_{shake_128(vip.encode()).hexdigest(7)}_vip"
vip_params = " params"
if isinstance(ipaddr, IPv4Address):
vip_resources = "ocf:heartbeat:IPaddr2"
vip_params += f' ip="{vip}"'
else:
vip_resources = "ocf:heartbeat:IPv6addr"
vip_params += f' ipv6addr="{vip}"'

# Monitor the VIP
vip_params += ' meta migration-threshold="INFINITY" failure-timeout="5s"'
vip_params += ' op monitor timeout="20s" interval="10s" depth="0"'
json_resources = json.dumps({vip_key: vip_resources})
json_resource_params = json.dumps({vip_key: vip_params})

else:
json_resources = "{}"
json_resource_params = "{}"

self.relation.data[self.charm.unit].update(
{
"json_resources": json_resources,
"json_resource_params": json_resource_params,
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider merging the logic in these two functions together, similar to ConnectionInformation in database_requires & exposing as little as needed (instead of exposing .relation and the config value)

I think this will help with the current issue with not checking is_clustered when determining the endpoint

Comment on lines 92 to 93
if self._ha_cluster.relation and self.config.get("vip"):
return self.config["vip"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if is_clustered is false but the relation and config value exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch! added a check for is_clustered() in 9649f9e

however, will sync with @dragomirp about a more unified approach to reporting the VIP to the client

Comment on lines 36 to 42
def _is_clustered(self) -> bool:
for key, value in self.relation.data.items():
if isinstance(key, ops.Unit) and key != self.charm.unit:
if value.get("clustered") in ("yes", "true"):
return True
break
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper is checking if hacluster is clustered

and what does it mean for hacluster to be clustered or not clustered?

also, when would one unit be clustered and other units not be clustered? if we scale up hacluster/the principal and hacluster's new unit databag is not immediately populated should we stop using hacluster?

@shayancanonical
Copy link
Contributor Author

Merging to move the initiative forward. Will sync with Dragomir to see what may/can be done to improve UX

@shayancanonical shayancanonical merged commit f9ea24c into main Sep 23, 2024
58 checks passed
@shayancanonical shayancanonical deleted the feature/hacluster branch September 23, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants