Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Tablet startup in super_read_only mode #12180

Closed
5 tasks done
rsajwani opened this issue Jan 28, 2023 · 2 comments · Fixed by #12206
Closed
5 tasks done

[RFC] Tablet startup in super_read_only mode #12180

rsajwani opened this issue Jan 28, 2023 · 2 comments · Fixed by #12206
Assignees

Comments

@rsajwani
Copy link
Contributor

rsajwani commented Jan 28, 2023

Overview of the Issue

TL;DR

We want replicas DB to be in super_read_only mode (i-e super-read-only set to true) all the time. Only primary DB can accept writes (i-e super_read_only set to false). This will make sure that nobody can modify replica from anywhere, anytime.

Motivation

As of today all replicas comes up in read_only mode, however this doesn't prevent user's like root and vt_dba who have SUPER privileges to change the database anytime anywhere. We want to leverage GLOBAL super_read_only configuration in order to protect against errant GTIDs. This will make sure that apart from primary no other component or offline system can mutate DB resulting in errant GTIDs that are then lying in wait to cause later failures, as you can see in #9312 and #10094

Furthermore not all the uses-cases are ensuring that we end up replica in read-only mode. During ERS there are cases where we don't set the read-only. With super_read_only change we will make sure that we cover those cases as well.

History

In the past we have run into situation where ERS turns primary results in errantGTIDs. The reason turns out to be cases where offline processes or operators might have mutated replica schema and it make them potential for errant GTIDs. Here are some of the issues which indicated that we ran into situations like this in production systems.

#10363
#10094
#9312
#10448

Possible Approaches

We had tried few times to implement this super_read_only change but we ran into different ranges of issue mainly due to de-centralized logic of setting replica to read/write mode and withDDL all across code base. I am listing here few changes done in the past for super_read_only, but later on due to regression they have be reverted.

#11706
#10094
#9312
#10448

With schema initialization changes #11520, we believe it will be much easier now to implement super_read_only change.

Proposed Solution

This change should be build on top of #11520, where instead of using withddl we use declarative approach. Using declarative approach has helped us condense all our schema changes to one place. This provide us an opportunity to apply super_read_only changes to our DB.

Every MySql instance will bootstrap in super_read_only mode. Only during init_db we switch super_read_only OFF temporarily in order to perform some mutations like creating necessary users and permission. During Reparenting we will change super_read_only values for primary and replica as require. Here is the quick summary of reparenting operations.

All together there are few operations which are called across reparenting. I am summarizing how they are going to change super_read_only status. This will help to understand individual reparenting operations PRS, ERS & ExternallyReparenting.

PromoteReplica --> Sets super_read_only to OFF for given Replica Tablet
DemotePrimary --> Sets super_read_only to ON for given Primary Tablet

InitShardPrimary
	Calls InitPrimary
        ---> Calls tm.ChangeTypeLocked with DBAAction.ReadWriteAction 
                 ---> Call SetReadOnly(false)

PRS

switch {
case currentPrimary == nil && ev.ShardInfo.PrimaryAlias == nil:
	// Case (1): no primary has been elected ever. Initialize
	// the primary-elect tablet
	reparentJournalPos, err = pr.performInitialPromotion(ctx, ev.NewPrimary, opts)
	---> Calls InitPrimary
		---> Calls tm.ChangeTypeLocked with DBAAction.ReadWriteAction
			---> Calls sql.setReadOnly(false)
case currentPrimary == nil && ev.ShardInfo.PrimaryAlias != nil:
	// Case (2): no clear current primary. Try to find a safe promotion
	// candidate, and promote to it.
	reparentJournalPos, err = pr.performPotentialPromotion(ctx, keyspace, shard, ev.NewPrimary, tabletMap, opts)
	---> Calls PromoteReplica
	         ---> Call tm.ChangeTypeLocked with DBAAction.ReadWriteAction
			 ---> Calls sql.setReadOnly(false)
case topoproto.TabletAliasEqual(currentPrimary.Alias, opts.NewPrimaryAlias):
	// Case (3): desired new primary is the current primary. Attempt to fix
	// up replicas to recover from a previous partial promotion.
	reparentJournalPos, err = pr.performPartialPromotionRecovery(ctx, ev.NewPrimary)
	---> Calls tm.SetReadWrite(true)
                      ---> Calls setReadOnly(false)
default:
	// Case (4): desired primary and current primary differ. Do a graceful
	// demotion-then-promotion.
	reparentJournalPos, err = pr.performGracefulPromotion(ctx, ev, keyspace, shard, currentPrimary, ev.NewPrimary, tabletMap, opts)
	---> Calls DemotePrimary
		---> Calls sql.setSuperReadOnly(true)
	Calls tm.promoteReplica 
		---> Calls tm.ChangeTypeLocked with DBAAction.ReadWriteAction
		         ---> Calls sql.setReadOnly(false)

ERS

Calls InitPrimary if its uninitialize cluster , in this case PrimarAlias is empty
	---> Calls tm.ChangeTypeLocked with DBAAction.ReadWriteAction
	         ---> Calls sql.setReadOnly(false)
     ---OR---
Calls tm.promoteReplica 
     	---> Call tm.ChangeTypeLocked with DBAAction.ReadWriteAction
		---> Calls sql.setReadOnly(false)

     Call reparentReplica which sets the source of replication (setreplicationsource)

ExternallyReparentShard

Vitess expects that the user has set the database into ReadWrite mode before calling this reparenting type.

I am listing here major changes which is needed for this RFC.

  • my.cnf contains super-read-only, to ensure all Mysql instances comes up in super-read-only mode.
  • init_db.sql is executed after sql get initialized. We switch super_read_only off temporarily in order to perform some mutations like creating necessary users and permission.
  • unit test doesn't need super_read_only, so we will have a separate init_db.sql file for unit tests.
  • All places where we have to switch ON read/write we will turn off read_only. All places where we have to switch OFF read/write we will turn on super_read_only
  • Adding isSuperReadOnly property in fullStatus, so that tablet state can be queried through vtctld.
@rsajwani rsajwani self-assigned this Jan 28, 2023
@rsajwani rsajwani changed the title Tablet Startup in Read-Only-Mode [RFC] Tablet Startup in Read-Only-Mode Jan 28, 2023
@deepthi
Copy link
Member

deepthi commented Jan 30, 2023

  • The title is misleading. As you state in the description: As of today all replicas comes up in read-only mode. The title should more accurately reflect the proposed change.
  • The actual MySQL setting is called super_read_only. If we are using any delimiter at all, we might as well write this accurately instead of calling it super-read-only. Settings like this should be code-formatted, to make it easier to understand that they are specific strings/flags/settings versus regular words.
  • Excerpt from a Percona blog post: When enabling the super_read_only system variable, please keep in mind the following implications:
    • Setting super_read_only ON implicitly forces read_only ON
    • Setting read_only OFF implicitly forces super_read_only OFF
  • Is the pseudocode for PRS and ERS the current code or proposed changes? If this is the proposed change, why is there no handling of super_read_only in the ERS section?
  • Let us link the related issues Use consistent approach when try to query super-read-only #12186 and Deprecate and delete use_super_read_only flag from VTTtablet #12140

@rsajwani rsajwani changed the title [RFC] Tablet Startup in Read-Only-Mode [RFC] Tablet Startup in super_read-only Mode Jan 30, 2023
@rsajwani rsajwani changed the title [RFC] Tablet Startup in super_read-only Mode [RFC] Tablet Startup in super_read_only Mode Jan 30, 2023
@rsajwani
Copy link
Contributor Author

I have done some correction.. Will continue to work on it ..

Above code changes are pseudo code, we will be following the convention where

primary will have call to read_only to OFF
replicas will have call to super_read_only to ON

@rsajwani rsajwani changed the title [RFC] Tablet Startup in super_read_only Mode [RFC] Tablet startup in super_read_only mode Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants