Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up mirroring for multiple cephblockpools #1829

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Nikhil-Ladha
Copy link

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

@Nikhil-Ladha Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 48871ee to 28732b9 Compare February 13, 2025 14:03
@Nikhil-Ladha
Copy link
Author

I am not sure how much aligned this change would be wrt to the drenv tool, but me as a user found this inconvenience while using the tool for my testing and thought of sending a fix for it :)

@Nikhil-Ladha
Copy link
Author

/cc @nirs

@Nikhil-Ladha
Copy link
Author

Test results:

Disable rbd-mirror debug logs in cluster 'dr1'
Disable rbd-mirror debug logs in cluster 'dr2'
Getting mirroring info site name for cluster 'dr1'
'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.14 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr1'
Getting secret pool-peer-token-replicapool token for cluster 'dr1'
'cephblockpools.ceph.rook.io/replicapool-2' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.17 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr1'
Getting secret pool-peer-token-replicapool-2 token for cluster 'dr1'
Getting mirroring info site name for cluster 'dr2'
'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.16 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr2'
Getting secret pool-peer-token-replicapool token for cluster 'dr2'
'cephblockpools.ceph.rook.io/replicapool-2' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.16 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr2'
Getting secret pool-peer-token-replicapool-2 token for cluster 'dr2'
Setting up mirroring from 'dr2' to 'dr1'
Applying rbd mirror secret in cluster 'dr1'
secret/a1846069-412a-48d6-9427-bb7d03c3668b configured
Configure peers for cluster 'dr1'
cephblockpool.ceph.rook.io/replicapool patched
secret/a1846069-412a-48d6-9427-bb7d03c3668b configured
Configure peers for cluster 'dr1'
cephblockpool.ceph.rook.io/replicapool-2 patched
Creating VolumeReplicationClass
volumereplicationclass.replication.storage.openshift.io/vrc-sample unchanged
volumegroupreplicationclass.replication.storage.openshift.io/vgrc-sample unchanged
Apply rbd mirror to cluster 'dr1'
cephrbdmirror.ceph.rook.io/my-rbd-mirror unchanged
Setting up mirroring from 'dr1' to 'dr2'
Applying rbd mirror secret in cluster 'dr2'
secret/af74e308-8008-4c4f-a18f-53c7d228239c configured
Configure peers for cluster 'dr2'
cephblockpool.ceph.rook.io/replicapool patched
secret/af74e308-8008-4c4f-a18f-53c7d228239c configured
Configure peers for cluster 'dr2'
cephblockpool.ceph.rook.io/replicapool-2 patched
Creating VolumeReplicationClass
volumereplicationclass.replication.storage.openshift.io/vrc-sample unchanged
volumegroupreplicationclass.replication.storage.openshift.io/vgrc-sample unchanged
Apply rbd mirror to cluster 'dr2'
cephrbdmirror.ceph.rook.io/my-rbd-mirror unchanged
Waiting until rbd mirror is ready in cluster 'dr1'
'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' found in 0.14 seconds
cephrbdmirror.ceph.rook.io/my-rbd-mirror condition met
Cluster 'dr1' rbd mirror status:
  observedGeneration: 1
  phase: Ready

Waiting until rbd mirror is ready in cluster 'dr2'
'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' found in 0.12 seconds
cephrbdmirror.ceph.rook.io/my-rbd-mirror condition met
Cluster 'dr2' rbd mirror status:
  observedGeneration: 1
  phase: Ready

Waiting for mirroring health in cluster 'dr1' (1/3)
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring healthy in 68.54 seconds
Cluster 'dr1' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:24Z'
    lastChecked: '2025-02-13T13:57:24Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: 92828599-7e64-47a1-9a81-81803aba320c
      site_name: a1846069-412a-48d6-9427-bb7d03c3668b
      uuid: 987c1ff7-18ee-4dc2-b8c4-3eddf4d150f3
    site_name: af74e308-8008-4c4f-a18f-53c7d228239c
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:24Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 10
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:24Z'
    lastChecked: '2025-02-13T13:57:24Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool

Cluster 'dr1' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool-2
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:55:34Z'
    lastChecked: '2025-02-13T13:56:35Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      site_name: a1846069-412a-48d6-9427-bb7d03c3668b
      uuid: 7fd8acae-a6f4-45c1-96f2-e354ab731554
    site_name: af74e308-8008-4c4f-a18f-53c7d228239c
  mirroringStatus:
    lastChecked: '2025-02-13T13:56:35Z'
    summary:
      daemon_health: WARNING
      health: WARNING
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 11
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:55:34Z'
    lastChecked: '2025-02-13T13:56:35Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool-2

Waiting for mirroring health in cluster 'dr2' (1/3)
Cluster 'dr2' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}}
Cluster 'dr2' mirroring healthy in 0.13 seconds
Cluster 'dr2' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:04Z'
    lastChecked: '2025-02-13T13:57:05Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: b073fffc-d0a8-4a5e-9d9e-1b32e32c92f0
      site_name: af74e308-8008-4c4f-a18f-53c7d228239c
      uuid: 9ac1b4cf-3544-4590-abdb-75d29a154ed5
    site_name: a1846069-412a-48d6-9427-bb7d03c3668b
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:05Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 12
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:04Z'
    lastChecked: '2025-02-13T13:57:05Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool

Cluster 'dr2' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool-2
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:15Z'
    lastChecked: '2025-02-13T13:57:16Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: 803bd7fa-3616-46d4-9b46-cf2693c278c2
      site_name: af74e308-8008-4c4f-a18f-53c7d228239c
      uuid: 557d1654-fe3e-468b-8c05-44f73d78339a
    site_name: a1846069-412a-48d6-9427-bb7d03c3668b
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:16Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 13
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:15Z'
    lastChecked: '2025-02-13T13:57:16Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool-2

Mirroring was setup successfully

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

One problem in this approach is that we have to change lot of code to loop over pools, and it may not efficient since we need to wait for one pool and then wait for the other.

In test/addons/volsync/test we use another approach - we run the same test twice using concurrent.futures.ThreadPoolExecutor. This keep the code simple and run everything in parallel without modifying the code. I think this is the right approach for new code since it is much easier to work with the code this way, and it is likely more efficient.

Regardless, we need to include the pool names in logs so we can understand whats going on and debug this.

@@ -32,33 +32,35 @@ def fetch_secret_info(cluster):
info = {}

print(f"Getting mirroring info site name for cluster '{cluster}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to move the log into the loop, so it also include the pool name.

Same for other logs in this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will update this.

@@ -86,33 +88,37 @@ def disable_rbd_mirror_debug_logs(cluster):
def configure_rbd_mirroring(cluster, peer_info):
print(f"Applying rbd mirror secret in cluster '{cluster}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samme for this log, we want to have one log per pool.


print("Creating VolumeReplicationClass")
template = drenv.template("start-data/vrc-sample.yaml")
yaml = template.substitute(cluster=cluster)
kubectl.apply("--filename=-", input=yaml, context=cluster)

template = drenv.template("start-data/vgrc-sample.yaml")
yaml = template.substitute(cluster=cluster, pool=POOL_NAME)
yaml = template.substitute(cluster=cluster, pool="replicapool")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we configure mirroring on multiple pools, why not create multiple vrc/vgrc?

If we don't need it now, please add a comment explaining why we create vrc/vgrc only for replicapool.

But in any case, use POOL_NAMES[0] so if we change the name of the pools this code will remain correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if we need multiple vrc/vgrc, hence I omitted their creation.
They might just be idle and not be required at all, hence I beleive it might not be worth creating them at the first place and if anybody needs it, they can create it manually.
It is easier to create than a blockpool with mirroring enabled ;)

But in any case, use POOL_NAMES[0] so if we change the name of the pools this code will remain correct.

Sure, will update that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call this in parallel the vrc/vgrc will be created automatically. I don't see reason not to create them if we create a pool and set up mirroring, this is the easy part. It will enable interesting tests that we don't do now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we go ahead with creating multiple vrc/vgrc then how are planning to name them? If we just strip the number from replicapool-2 even then are we sure that is how we are going to name always?

"--namespace=rook-ceph",
context=cluster,
)
info = {f"Cluster '{cluster}' ceph block pool status": json.loads(status)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the name of the pool to make the log lines unique.

print(
f"Cluster '{cluster}' mirroring healthy in {elapsed:.2f} seconds"
)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will wait for one pool status, and then wait for the other. It would be better to wait for both in parallel but it should be good enough. When the first pool is ready the wait for the second should be very short or zero.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, from my testing while the loop reaches for the second pool it hardly takes a couple of secs to complete through as the mirroring already got setup when it was iterating through the first loop. So, probably we can skip the trouble of parallelizing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parallelizing is easy if we do this in the top level, see test/addons/volsync/test.

@nirs nirs requested review from rakeshgm and parikshithb February 13, 2025 15:06
@nirs
Copy link
Member

nirs commented Feb 13, 2025

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

Can you open an issue explaining the problem you experience? Our e2e tests works fine with current code since we don't use the second replica pool.

@Nikhil-Ladha
Copy link
Author

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

Can you open an issue explaining the problem you experience? Our e2e tests works fine with current code since we don't use the second replica pool.

I won't describe it as an issue exactly, but more of an incovenience.
I have been using the tool for some time now for VGR testing and haven't pulled the latest code for a long time. A couple of days back I did so, and recreated my env. That's when I saw that we are now creating multiple pools so I thought of using the other pool for some other test and I noticed that though mirroring is enabled on the pool but the setup is not done. And, I had to tweak the code to get it done. That's why I thought of sending this PR, if someone else like me comes across this scenario it could be helpful for them :)

@nirs
Copy link
Member

nirs commented Feb 13, 2025

I won't describe it as an issue exactly, but more of an incovenience. I have been using the tool for some time now for VGR testing and haven't pulled the latest code for a long time. A couple of days back I did so, and recreated my env. That's when I saw that we are now creating multiple pools so I thought of using the other pool for some other test and I noticed that though mirroring is enabled on the pool but the setup is not done. And, I had to tweak the code to get it done. That's why I thought of sending this PR, if someone else like me comes across this scenario it could be helpful for them :)

Ok, than no issue needed.

@nirs nirs changed the title fix mirroring for multiple cephblockpools Set up mirroring for multiple cephblockpools Feb 13, 2025
@nirs nirs added enhancement New feature or request test Testing related issue labels Feb 13, 2025
@nirs
Copy link
Member

nirs commented Feb 13, 2025

@Nikhil-Ladha can you update the commit message and pr message to reflect the actual change? (we have multiple pool but configure mirroring only one one, why you want to configure mirroring on the second, etc).

@Nikhil-Ladha Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 28732b9 to 91fd243 Compare February 14, 2025 10:19
@Nikhil-Ladha
Copy link
Author

Thanks for the PR!

One problem in this approach is that we have to change lot of code to loop over pools, and it may not efficient since we need to wait for one pool and then wait for the other.

In test/addons/volsync/test we use another approach - we run the same test twice using concurrent.futures.ThreadPoolExecutor. This keep the code simple and run everything in parallel without modifying the code. I think this is the right approach for new code since it is much easier to work with the code this way, and it is likely more efficient.

Regardless, we need to include the pool names in logs so we can understand whats going on and debug this.

Updated the code to parallelize the setup of mirroring 2 pools. Also, added pool names in the log wherever applicable.
Please check the code now and let me know if something could be improved.

update setup of rbd mirroring and peer secrets addition, when more than
one cephblockpool is configured in ramen tests

Signed-off-by: Nikhil-Ladha <[email protected]>
@Nikhil-Ladha Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 91fd243 to 1cb97a4 Compare February 14, 2025 10:25
Copy link
Member

@parikshithb parikshithb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments wrt to logging

context=cluster,
)

print("Waiting for replica pool peer token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pool name needs to be logged here

kubectl.apply("--filename=-", input=yaml, context=cluster)

print("Creating VolumeGroupReplicationClass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since planning to have multiple vrc and vrgc, logging their names would make it clearer and easier to debug.

cluster1_info = fetch_secret_info(cluster1, pool)
cluster2_info = fetch_secret_info(cluster2, pool)

print(f"Setting up mirroring from '{cluster2}' to '{cluster1}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log "Setting up mirroring for '{pool}' from...."

print(f"Setting up mirroring from '{cluster2}' to '{cluster1}'")
configure_rbd_mirroring(cluster1, pool, cluster2_info)

print(f"Setting up mirroring from '{cluster1}' to '{cluster2}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

"--namespace=rook-ceph",
context=cluster,
)
info = {"ceph pool status": json.loads(out)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log ceph pool status with pool name

@nirs
Copy link
Member

nirs commented Feb 18, 2025

Seems that this change does a lot of changes already in #1823. Lets wait until that PR is finished.

Comment on lines +112 to +118
vrc_name = "vrc-sample"
vgrc_name = "vgrc-sample"
if pool != 'replicapool':
num = pool.split("-")[1]
vrc_name = f"{vrc_name}-{num}"
vgrc_name = f"{vgrc_name}-{num}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the logic looks nice, I would not check if the name is replicapool and then proceed to create secondary vrc and vgrc. Also the storageIDs would be same for both VRC and VGRC in this case, replicationID should also change once storageClass is different. I have a PR opened with similar changes. Once that PR gets merged, you probably need to rebase and work on top of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request test Testing related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants