Set up mirroring for multiple cephblockpools #1829

Nikhil-Ladha · 2025-02-13T14:01:36Z

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

Nikhil-Ladha · 2025-02-13T14:04:59Z

I am not sure how much aligned this change would be wrt to the drenv tool, but me as a user found this inconvenience while using the tool for my testing and thought of sending a fix for it :)

Nikhil-Ladha · 2025-02-13T14:05:41Z

/cc @nirs

Nikhil-Ladha · 2025-02-13T14:06:44Z

Test results:

Disable rbd-mirror debug logs in cluster 'dr1'
Disable rbd-mirror debug logs in cluster 'dr2'
Getting mirroring info site name for cluster 'dr1'
'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.14 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr1'
Getting secret pool-peer-token-replicapool token for cluster 'dr1'
'cephblockpools.ceph.rook.io/replicapool-2' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.17 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr1'
Getting secret pool-peer-token-replicapool-2 token for cluster 'dr1'
Getting mirroring info site name for cluster 'dr2'
'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.16 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr2'
Getting secret pool-peer-token-replicapool token for cluster 'dr2'
'cephblockpools.ceph.rook.io/replicapool-2' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.16 seconds
Getting rbd mirror boostrap peer secret name for cluster 'dr2'
Getting secret pool-peer-token-replicapool-2 token for cluster 'dr2'
Setting up mirroring from 'dr2' to 'dr1'
Applying rbd mirror secret in cluster 'dr1'
secret/a1846069-412a-48d6-9427-bb7d03c3668b configured
Configure peers for cluster 'dr1'
cephblockpool.ceph.rook.io/replicapool patched
secret/a1846069-412a-48d6-9427-bb7d03c3668b configured
Configure peers for cluster 'dr1'
cephblockpool.ceph.rook.io/replicapool-2 patched
Creating VolumeReplicationClass
volumereplicationclass.replication.storage.openshift.io/vrc-sample unchanged
volumegroupreplicationclass.replication.storage.openshift.io/vgrc-sample unchanged
Apply rbd mirror to cluster 'dr1'
cephrbdmirror.ceph.rook.io/my-rbd-mirror unchanged
Setting up mirroring from 'dr1' to 'dr2'
Applying rbd mirror secret in cluster 'dr2'
secret/af74e308-8008-4c4f-a18f-53c7d228239c configured
Configure peers for cluster 'dr2'
cephblockpool.ceph.rook.io/replicapool patched
secret/af74e308-8008-4c4f-a18f-53c7d228239c configured
Configure peers for cluster 'dr2'
cephblockpool.ceph.rook.io/replicapool-2 patched
Creating VolumeReplicationClass
volumereplicationclass.replication.storage.openshift.io/vrc-sample unchanged
volumegroupreplicationclass.replication.storage.openshift.io/vgrc-sample unchanged
Apply rbd mirror to cluster 'dr2'
cephrbdmirror.ceph.rook.io/my-rbd-mirror unchanged
Waiting until rbd mirror is ready in cluster 'dr1'
'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' found in 0.14 seconds
cephrbdmirror.ceph.rook.io/my-rbd-mirror condition met
Cluster 'dr1' rbd mirror status:
  observedGeneration: 1
  phase: Ready

Waiting until rbd mirror is ready in cluster 'dr2'
'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' found in 0.12 seconds
cephrbdmirror.ceph.rook.io/my-rbd-mirror condition met
Cluster 'dr2' rbd mirror status:
  observedGeneration: 1
  phase: Ready

Waiting for mirroring health in cluster 'dr1' (1/3)
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'WARNING', 'health': 'WARNING', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}}
Cluster 'dr1' mirroring healthy in 68.54 seconds
Cluster 'dr1' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:24Z'
    lastChecked: '2025-02-13T13:57:24Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: 92828599-7e64-47a1-9a81-81803aba320c
      site_name: a1846069-412a-48d6-9427-bb7d03c3668b
      uuid: 987c1ff7-18ee-4dc2-b8c4-3eddf4d150f3
    site_name: af74e308-8008-4c4f-a18f-53c7d228239c
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:24Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 10
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:24Z'
    lastChecked: '2025-02-13T13:57:24Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool

Cluster 'dr1' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool-2
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:55:34Z'
    lastChecked: '2025-02-13T13:56:35Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      site_name: a1846069-412a-48d6-9427-bb7d03c3668b
      uuid: 7fd8acae-a6f4-45c1-96f2-e354ab731554
    site_name: af74e308-8008-4c4f-a18f-53c7d228239c
  mirroringStatus:
    lastChecked: '2025-02-13T13:56:35Z'
    summary:
      daemon_health: WARNING
      health: WARNING
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 11
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:55:34Z'
    lastChecked: '2025-02-13T13:56:35Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool-2

Waiting for mirroring health in cluster 'dr2' (1/3)
Cluster 'dr2' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}}
Cluster 'dr2' mirroring healthy in 0.13 seconds
Cluster 'dr2' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:04Z'
    lastChecked: '2025-02-13T13:57:05Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: b073fffc-d0a8-4a5e-9d9e-1b32e32c92f0
      site_name: af74e308-8008-4c4f-a18f-53c7d228239c
      uuid: 9ac1b4cf-3544-4590-abdb-75d29a154ed5
    site_name: a1846069-412a-48d6-9427-bb7d03c3668b
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:05Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 12
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:04Z'
    lastChecked: '2025-02-13T13:57:05Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool

Cluster 'dr2' ceph block pool status:
  info:
    failureDomain: host
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool-2
    type: Replicated
  mirroringInfo:
    lastChanged: '2025-02-13T13:56:15Z'
    lastChecked: '2025-02-13T13:57:16Z'
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: 803bd7fa-3616-46d4-9b46-cf2693c278c2
      site_name: af74e308-8008-4c4f-a18f-53c7d228239c
      uuid: 557d1654-fe3e-468b-8c05-44f73d78339a
    site_name: a1846069-412a-48d6-9427-bb7d03c3668b
  mirroringStatus:
    lastChecked: '2025-02-13T13:57:16Z'
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states: {}
  observedGeneration: 3
  phase: Ready
  poolID: 13
  snapshotScheduleStatus:
    lastChanged: '2025-02-13T13:56:15Z'
    lastChecked: '2025-02-13T13:57:16Z'
    snapshotSchedules:
    - image: '-'
      items:
      - interval: 2m
        start_time: 14:00:00-05:00
      namespace: '-'
      pool: replicapool-2

Mirroring was setup successfully

nirs

Thanks for the PR!

One problem in this approach is that we have to change lot of code to loop over pools, and it may not efficient since we need to wait for one pool and then wait for the other.

In test/addons/volsync/test we use another approach - we run the same test twice using concurrent.futures.ThreadPoolExecutor. This keep the code simple and run everything in parallel without modifying the code. I think this is the right approach for new code since it is much easier to work with the code this way, and it is likely more efficient.

Regardless, we need to include the pool names in logs so we can understand whats going on and debug this.

test/addons/rbd-mirror/start

nirs · 2025-02-13T14:48:04Z

test/addons/rbd-mirror/start

@@ -32,33 +32,35 @@ def fetch_secret_info(cluster):
    info = {}

    print(f"Getting mirroring info site name for cluster '{cluster}'")


It would help to move the log into the loop, so it also include the pool name.

Same for other logs in this function.

Ack, will update this.

nirs · 2025-02-13T14:50:03Z

test/addons/rbd-mirror/start

@@ -86,33 +88,37 @@ def disable_rbd_mirror_debug_logs(cluster):
 def configure_rbd_mirroring(cluster, peer_info):
    print(f"Applying rbd mirror secret in cluster '{cluster}'")


Samme for this log, we want to have one log per pool.

nirs · 2025-02-13T14:53:11Z

test/addons/rbd-mirror/start


    print("Creating VolumeReplicationClass")
    template = drenv.template("start-data/vrc-sample.yaml")
    yaml = template.substitute(cluster=cluster)
    kubectl.apply("--filename=-", input=yaml, context=cluster)

    template = drenv.template("start-data/vgrc-sample.yaml")
-    yaml = template.substitute(cluster=cluster, pool=POOL_NAME)
+    yaml = template.substitute(cluster=cluster, pool="replicapool")


If we configure mirroring on multiple pools, why not create multiple vrc/vgrc?

If we don't need it now, please add a comment explaining why we create vrc/vgrc only for replicapool.

But in any case, use POOL_NAMES[0] so if we change the name of the pools this code will remain correct.

I was not sure if we need multiple vrc/vgrc, hence I omitted their creation.
They might just be idle and not be required at all, hence I beleive it might not be worth creating them at the first place and if anybody needs it, they can create it manually.
It is easier to create than a blockpool with mirroring enabled ;)

But in any case, use POOL_NAMES[0] so if we change the name of the pools this code will remain correct.

Sure, will update that.

If we call this in parallel the vrc/vgrc will be created automatically. I don't see reason not to create them if we create a pool and set up mirroring, this is the easy part. It will enable interesting tests that we don't do now.

I was wondering if we go ahead with creating multiple vrc/vgrc then how are planning to name them? If we just strip the number from replicapool-2 even then are we sure that is how we are going to name always?

nirs · 2025-02-13T14:53:54Z

test/addons/rbd-mirror/start

+            "--namespace=rook-ceph",
+            context=cluster,
+        )
+        info = {f"Cluster '{cluster}' ceph block pool status": json.loads(status)}


We need the name of the pool to make the log lines unique.

nirs · 2025-02-13T14:56:19Z

test/addons/rbd-mirror/start

+                        print(
+                            f"Cluster '{cluster}' mirroring healthy in {elapsed:.2f} seconds"
+                        )
+                        return


This will wait for one pool status, and then wait for the other. It would be better to wait for both in parallel but it should be good enough. When the first pool is ready the wait for the second should be very short or zero.

Indeed, from my testing while the loop reaches for the second pool it hardly takes a couple of secs to complete through as the mirroring already got setup when it was iterating through the first loop. So, probably we can skip the trouble of parallelizing here.

parallelizing is easy if we do this in the top level, see test/addons/volsync/test.

nirs · 2025-02-13T15:07:56Z

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

Can you open an issue explaining the problem you experience? Our e2e tests works fine with current code since we don't use the second replica pool.

Nikhil-Ladha · 2025-02-13T15:37:12Z

fix rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests

Can you open an issue explaining the problem you experience? Our e2e tests works fine with current code since we don't use the second replica pool.

I won't describe it as an issue exactly, but more of an incovenience.
I have been using the tool for some time now for VGR testing and haven't pulled the latest code for a long time. A couple of days back I did so, and recreated my env. That's when I saw that we are now creating multiple pools so I thought of using the other pool for some other test and I noticed that though mirroring is enabled on the pool but the setup is not done. And, I had to tweak the code to get it done. That's why I thought of sending this PR, if someone else like me comes across this scenario it could be helpful for them :)

nirs · 2025-02-13T15:40:50Z

I won't describe it as an issue exactly, but more of an incovenience. I have been using the tool for some time now for VGR testing and haven't pulled the latest code for a long time. A couple of days back I did so, and recreated my env. That's when I saw that we are now creating multiple pools so I thought of using the other pool for some other test and I noticed that though mirroring is enabled on the pool but the setup is not done. And, I had to tweak the code to get it done. That's why I thought of sending this PR, if someone else like me comes across this scenario it could be helpful for them :)

Ok, than no issue needed.

nirs · 2025-02-13T15:43:56Z

@Nikhil-Ladha can you update the commit message and pr message to reflect the actual change? (we have multiple pool but configure mirroring only one one, why you want to configure mirroring on the second, etc).

Nikhil-Ladha · 2025-02-14T10:21:07Z

Thanks for the PR!

One problem in this approach is that we have to change lot of code to loop over pools, and it may not efficient since we need to wait for one pool and then wait for the other.

In test/addons/volsync/test we use another approach - we run the same test twice using concurrent.futures.ThreadPoolExecutor. This keep the code simple and run everything in parallel without modifying the code. I think this is the right approach for new code since it is much easier to work with the code this way, and it is likely more efficient.

Regardless, we need to include the pool names in logs so we can understand whats going on and debug this.

Updated the code to parallelize the setup of mirroring 2 pools. Also, added pool names in the log wherever applicable.
Please check the code now and let me know if something could be improved.

update setup of rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests Signed-off-by: Nikhil-Ladha <[email protected]>

parikshithb

minor comments wrt to logging

parikshithb · 2025-02-14T15:58:46Z

test/addons/rook-pool/start

+            context=cluster,
+        )
+
+        print("Waiting for replica pool peer token")


Pool name needs to be logged here

parikshithb · 2025-02-14T19:25:37Z

test/addons/rbd-mirror/start

    kubectl.apply("--filename=-", input=yaml, context=cluster)

+    print("Creating VolumeGroupReplicationClass")


Since planning to have multiple vrc and vrgc, logging their names would make it clearer and easier to debug.

parikshithb · 2025-02-14T19:27:08Z

test/addons/rbd-mirror/start

+    cluster1_info = fetch_secret_info(cluster1, pool)
+    cluster2_info = fetch_secret_info(cluster2, pool)
+
+    print(f"Setting up mirroring from '{cluster2}' to '{cluster1}'")


log "Setting up mirroring for '{pool}' from...."

parikshithb · 2025-02-14T19:27:10Z

test/addons/rbd-mirror/start

+    print(f"Setting up mirroring from '{cluster2}' to '{cluster1}'")
+    configure_rbd_mirroring(cluster1, pool, cluster2_info)
+
+    print(f"Setting up mirroring from '{cluster1}' to '{cluster2}'")


parikshithb · 2025-02-14T19:27:19Z

test/addons/rook-pool/start

+            "--namespace=rook-ceph",
+            context=cluster,
+        )
+        info = {"ceph pool status": json.loads(out)}


log ceph pool status with pool name

nirs · 2025-02-18T18:29:42Z

Seems that this change does a lot of changes already in #1823. Lets wait until that PR is finished.

rakeshgm · 2025-02-19T09:51:34Z

test/addons/rbd-mirror/start

+    vrc_name = "vrc-sample"
+    vgrc_name = "vgrc-sample"
+    if pool != 'replicapool':
+        num = pool.split("-")[1]
+        vrc_name = f"{vrc_name}-{num}"
+        vgrc_name = f"{vgrc_name}-{num}"
+


Though the logic looks nice, I would not check if the name is replicapool and then proceed to create secondary vrc and vgrc. Also the storageIDs would be same for both VRC and VGRC in this case, replicationID should also change once storageClass is different. I have a PR opened with similar changes. Once that PR gets merged, you probably need to rebase and work on top of it.

Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 48871ee to 28732b9 Compare February 13, 2025 14:03

nirs reviewed Feb 13, 2025

View reviewed changes

nirs requested review from rakeshgm and parikshithb February 13, 2025 15:06

nirs changed the title ~~fix mirroring for multiple cephblockpools~~ Set up mirroring for multiple cephblockpools Feb 13, 2025

nirs added enhancement New feature or request test Testing related issue labels Feb 13, 2025

Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 28732b9 to 91fd243 Compare February 14, 2025 10:19

setup rbd mirroring for multiple cephblockpools

1cb97a4

update setup of rbd mirroring and peer secrets addition, when more than one cephblockpool is configured in ramen tests Signed-off-by: Nikhil-Ladha <[email protected]>

Nikhil-Ladha force-pushed the fix-extra-pool-mirroring branch from 91fd243 to 1cb97a4 Compare February 14, 2025 10:25

parikshithb reviewed Feb 14, 2025

View reviewed changes

rakeshgm reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up mirroring for multiple cephblockpools #1829

Set up mirroring for multiple cephblockpools #1829

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

nirs left a comment

nirs Feb 13, 2025

Nikhil-Ladha Feb 13, 2025

nirs Feb 13, 2025

nirs Feb 13, 2025

Nikhil-Ladha Feb 13, 2025

nirs Feb 13, 2025

Nikhil-Ladha Feb 14, 2025

nirs Feb 13, 2025

nirs Feb 13, 2025

Nikhil-Ladha Feb 13, 2025

nirs Feb 13, 2025

nirs commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

nirs commented Feb 13, 2025

nirs commented Feb 13, 2025

Nikhil-Ladha commented Feb 14, 2025

parikshithb left a comment

parikshithb Feb 14, 2025

parikshithb Feb 14, 2025

parikshithb Feb 14, 2025

parikshithb Feb 14, 2025

parikshithb Feb 14, 2025

nirs commented Feb 18, 2025

rakeshgm Feb 19, 2025

		@@ -32,33 +32,35 @@ def fetch_secret_info(cluster):
		info = {}

		print(f"Getting mirroring info site name for cluster '{cluster}'")

		@@ -86,33 +88,37 @@ def disable_rbd_mirror_debug_logs(cluster):
		def configure_rbd_mirroring(cluster, peer_info):
		print(f"Applying rbd mirror secret in cluster '{cluster}'")

		kubectl.apply("--filename=-", input=yaml, context=cluster)

		print("Creating VolumeGroupReplicationClass")

Set up mirroring for multiple cephblockpools #1829

Are you sure you want to change the base?

Set up mirroring for multiple cephblockpools #1829

Conversation

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

nirs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirs commented Feb 13, 2025

Nikhil-Ladha commented Feb 13, 2025

nirs commented Feb 13, 2025

nirs commented Feb 13, 2025

Nikhil-Ladha commented Feb 14, 2025

parikshithb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirs commented Feb 18, 2025

Choose a reason for hiding this comment