Some clarifications about data persistence [HZG-257] #1510

th0masb · 2025-01-30T14:28:25Z

Some clarifications that data persistence is there to augment in-memory partition backups and that planned cluster shutdowns must be done cluster-wide. A rolling restart is not the same as a cluster shutdown, it is a sequence of individual member restarts and so is more appropriate to categorize as a speed improvement instead of a resiliency one.

devOpsHazelcast · 2025-01-30T14:28:34Z

All committers have signed the CLA.

netlify · 2025-01-30T14:28:45Z

✅ Deploy Preview for hardcore-allen-f5257d ready!

Name	Link
🔨 Latest commit	`a289dbe`
🔍 Latest deploy log	https://app.netlify.com/sites/hardcore-allen-f5257d/deploys/67a38cbff0c98f0008839293
😎 Deploy Preview	https://deploy-preview-1510--hardcore-allen-f5257d.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

JamesHazelcast

Thanks for working to improve our documentation on this topic @th0masb - a few comments from me on this round

docs/modules/storage/pages/persistence.adoc

JamesHazelcast · 2025-01-30T17:21:02Z

docs/modules/storage/pages/persistence.adoc

+- **Speeding up single member restarts**: 
+
+** **Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.
+** **Unplanned**: A member may crash or terminate unexpectedly at any time, using persistence allows faster recovery.


This also depends on configuration - if rebalancing is delayed, or the cluster is in PASSIVE state, it loads all data from disk without receiving any from other members

I was under the impression that even if the member crashed when the cluster was in active state and rebalancing was not delayed the persisted data could still be used to speed up the member rejoining the cluster. Aren't merkle trees compared to reduce network traffic as much as possible?

Yes that's correct, I'm saying that in addition to this "faster recovery" case, there are conditions where it recovers entirely using its own data, not just using it as a speedup mechanism.

JamesHazelcast · 2025-01-30T17:21:40Z

docs/modules/storage/pages/persistence.adoc

+
+- **Speeding up single member restarts**: 
+
+** **Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.


This is part of a cluster-wide shutdown, we do not want to create confusion between these topics imo

I found it confusing to read the existing way, how is a rolling restart a cluster wide shutdown if only one member at any time is stopped? Is it mandatory to put the cluster into passive mode for a rolling restart? It seems like it would work fine if the cluster was still in active mode.

I suppose there's arguments to be made both ways - I can see why you see it as a series of single member restarts, but I can also see that it does eventually result in the entire cluster restarting, so it could be seen as a cluster-wide shutdown (restart). I'm not too fussed, we can let docs team review this aspect and see if it makes sense to them 👍

I think a cluster-wide restart implies the cluster becomes unavailable, and a rolling restart is designed to avoid that. I agree there's some overlap but I think the proposed text is good.

Rob-Hazelcast · 2025-01-31T17:38:16Z

@th0masb Looks like you're still debating the technical details here, so ignoring for now. Feel free to @ me when this is ready for editorial review, and please add the backport to all versions label if appropriate.

JamesHazelcast

A few more comments from me, but mainly nits so approving in advance - thanks for raising this @th0masb and improving our docs 👍

docs/modules/storage/pages/persistence.adoc

th0masb · 2025-02-03T16:54:15Z

@Rob-Hazelcast I think this is ready now, as for backporting I am unsure of the policy for the docs but since this is a clarification on existing functionality and the page content looks to be the same since 5.3 I would recommend backporting to 5.3.

Rob-Hazelcast

Suggested some small edits for clarify but generally looks good, thanks!

docs/modules/storage/pages/persistence.adoc

Rob-Hazelcast · 2025-02-04T12:46:26Z

docs/modules/storage/pages/persistence.adoc

+
+- **Speeding up single member restarts**: 
+
+** **Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.


I think a cluster-wide restart implies the cluster becomes unavailable, and a rolling restart is designed to avoid that. I agree there's some overlap but I think the proposed text is good.

docs/modules/storage/pages/persistence.adoc

Rob-Hazelcast · 2025-02-04T13:03:35Z

docs/modules/storage/pages/persistence.adoc

 ** **Unplanned**: A cluster is restarted after all its members crash at the same time due an event such as a power outage. Note that some data loss is expected unless `fsync` is set to `true`. For more information, see xref:storage:configuring-persistence.adoc#data-structures[Data structure options].
- **A single member restart**: A single member is restarted for whatever reason.
+
+- **Accelerating single member restarts**: 


Suggested change

- **Accelerating single member restarts**:

- **Single member restarts**:

The two bullets should be consistent. Removing the verb here; could alternatively add a verb ("recovering from cluster-wide shutdowns") above.

docs/modules/storage/pages/persistence.adoc

Backport of #1510 Some clarifications that data persistence is there to augment in-memory partition backups and that planned cluster shutdowns must be done cluster-wide. A rolling restart is not the same as a cluster shutdown, it is a sequence of individual member restarts and so is more appropriate to categorize as a speed improvement instead of a resiliency one. Co-authored-by: Thomas Ball <[email protected]>

th0masb requested a review from a team as a code owner January 30, 2025 14:28

th0masb requested a review from JamesHazelcast January 30, 2025 16:43

JamesHazelcast requested changes Jan 30, 2025

View reviewed changes

JamesHazelcast approved these changes Feb 3, 2025

View reviewed changes

th0masb force-pushed the hzg-257 branch from 3ecd200 to 5e410c5 Compare February 4, 2025 09:26

Rob-Hazelcast added the backport to all versions Backport commits to maintenance branches (from main) label Feb 4, 2025

Rob-Hazelcast approved these changes Feb 4, 2025

View reviewed changes

th0masb added 4 commits February 5, 2025 16:07

Some clarifications about data persistence [HZG-257]

ac32e2e

Some wording tweaks, clarified PASSIVE mode for cluster shutdown

c351e7b

Further wording refinements

81531dc

Further refinements

a289dbe

th0masb force-pushed the hzg-257 branch from 5e410c5 to a289dbe Compare February 5, 2025 16:07

th0masb merged commit fdcbc16 into hazelcast:main Feb 5, 2025
6 checks passed

th0masb deleted the hzg-257 branch February 5, 2025 16:17

github-actions bot mentioned this pull request Feb 5, 2025

Some clarifications about data persistence [v/5.5] #1524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clarifications about data persistence [HZG-257] #1510

Some clarifications about data persistence [HZG-257] #1510

th0masb commented Jan 30, 2025

devOpsHazelcast commented Jan 30, 2025 •

edited

Loading

netlify bot commented Jan 30, 2025 •

edited

Loading

JamesHazelcast left a comment

JamesHazelcast Jan 30, 2025

th0masb Jan 31, 2025

JamesHazelcast Feb 3, 2025

JamesHazelcast Jan 30, 2025

th0masb Jan 31, 2025 •

edited

Loading

JamesHazelcast Feb 3, 2025

Rob-Hazelcast Feb 4, 2025 •

edited

Loading

Rob-Hazelcast commented Jan 31, 2025

JamesHazelcast left a comment

th0masb commented Feb 3, 2025

Rob-Hazelcast left a comment

Rob-Hazelcast Feb 4, 2025 •

edited

Loading

Rob-Hazelcast Feb 4, 2025


		- Speeding up single member restarts:

		Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.

	- Accelerating single member restarts:
	- Single member restarts:

Some clarifications about data persistence [HZG-257] #1510

Some clarifications about data persistence [HZG-257] #1510

Conversation

th0masb commented Jan 30, 2025

devOpsHazelcast commented Jan 30, 2025 • edited Loading

netlify bot commented Jan 30, 2025 • edited Loading

✅ Deploy Preview for hardcore-allen-f5257d ready!

JamesHazelcast left a comment

Choose a reason for hiding this comment

JamesHazelcast Jan 30, 2025

Choose a reason for hiding this comment

th0masb Jan 31, 2025

Choose a reason for hiding this comment

JamesHazelcast Feb 3, 2025

Choose a reason for hiding this comment

JamesHazelcast Jan 30, 2025

Choose a reason for hiding this comment

th0masb Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

JamesHazelcast Feb 3, 2025

Choose a reason for hiding this comment

Rob-Hazelcast Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Rob-Hazelcast commented Jan 31, 2025

JamesHazelcast left a comment

Choose a reason for hiding this comment

th0masb commented Feb 3, 2025

Rob-Hazelcast left a comment

Choose a reason for hiding this comment

Rob-Hazelcast Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Rob-Hazelcast Feb 4, 2025

Choose a reason for hiding this comment

devOpsHazelcast commented Jan 30, 2025 •

edited

Loading

netlify bot commented Jan 30, 2025 •

edited

Loading

th0masb Jan 31, 2025 •

edited

Loading

Rob-Hazelcast Feb 4, 2025 •

edited

Loading

Rob-Hazelcast Feb 4, 2025 •

edited

Loading