Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to select tx throttler tablet type #12174

Merged
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/tablet/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ cacheResultFields: true # enable-query-plan-field-caching
# enable-tx-throttler
# tx-throttler-config
# tx-throttler-healthcheck-cells
# tx-throttler-tablet-types
# enable_transaction_limit
# enable_transaction_limit_dry_run
# transaction_limit_per_user
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,13 @@ If this is not specified a [default](https://github.com/vitessio/vitess/tree/mai
* *tx-throttler-healthcheck-cells*

A comma separated list of datacenter cells. The throttler will only monitor
the non-RDONLY replicas found in these cells for replication lag.
the replicas found in these cells for replication lag.

* *tx-throttler-tablet-types*

A comma separated list of tablet types. The throttler will only monitor tablets
with these types. Only `replica` and/or `rdonly` types are supported. The default
is `replica`.

# Caveats and Known Issues
* The throttler keeps trying to explore the maximum rate possible while keeping
Expand All @@ -39,4 +45,3 @@ lag limit may occasionally be slightly violated.

* Transactions are considered homogeneous. There is currently no support
for specifying how `expensive` a transaction is.

1 change: 1 addition & 0 deletions doc/design-docs/TabletServerParamsAsYAML.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ sanitizeLogMessages: false # sanitize_log_messages
# enable-tx-throttler
# tx-throttler-config
# tx-throttler-healthcheck-cells
# tx-throttler-tablet-types
# enable_transaction_limit
# enable_transaction_limit_dry_run
# transaction_limit_per_user
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vttablet.txt
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,8 @@ Usage of vttablet:
--twopc_enable if the flag is on, 2pc is enabled. Other 2pc flags must be supplied.
--tx-throttler-config string Synonym to -tx_throttler_config (default "target_replication_lag_sec: 2\nmax_replication_lag_sec: 10\ninitial_rate: 100\nmax_increase: 1\nemergency_decrease: 0.5\nmin_duration_between_increases_sec: 40\nmax_duration_between_increases_sec: 62\nmin_duration_between_decreases_sec: 20\nspread_backlog_across_sec: 20\nage_bad_rate_after_sec: 180\nbad_rate_increase: 0.1\nmax_rate_approach_threshold: 0.9\n")
--tx-throttler-healthcheck-cells strings Synonym to -tx_throttler_healthcheck_cells
--tx_throttler_config string The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message (default "target_replication_lag_sec: 2\nmax_replication_lag_sec: 10\ninitial_rate: 100\nmax_increase: 1\nemergency_decrease: 0.5\nmin_duration_between_increases_sec: 40\nmax_duration_between_increases_sec: 62\nmin_duration_between_decreases_sec: 20\nspread_backlog_across_sec: 20\nage_bad_rate_after_sec: 180\nbad_rate_increase: 0.1\nmax_rate_approach_threshold: 0.9\n")
--tx-throttler-tablet-types strings A comma-separated list of tablet types. Only tablets of this type are monitored for replication lag by the transaction throttler. Supported types are replica and/or rdonly. (default replica)
--tx_throttler_config string The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message. (default "target_replication_lag_sec: 2\nmax_replication_lag_sec: 10\ninitial_rate: 100\nmax_increase: 1\nemergency_decrease: 0.5\nmin_duration_between_increases_sec: 40\nmax_duration_between_increases_sec: 62\nmin_duration_between_decreases_sec: 20\nspread_backlog_across_sec: 20\nage_bad_rate_after_sec: 180\nbad_rate_increase: 0.1\nmax_rate_approach_threshold: 0.9\n")
--tx_throttler_healthcheck_cells strings A comma-separated list of cells. Only tabletservers running in these cells will be monitored for replication lag by the transaction throttler.
--unhealthy_threshold duration replication lag after which a replica is considered unhealthy (default 2h0m0s)
--use_super_read_only Set super_read_only flag when performing planned failover.
Expand Down
13 changes: 9 additions & 4 deletions go/vt/vttablet/tabletserver/tabletenv/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ import (
"vitess.io/vitess/go/vt/dbconfigs"
"vitess.io/vitess/go/vt/log"
querypb "vitess.io/vitess/go/vt/proto/query"
topodatapb "vitess.io/vitess/go/vt/proto/topodata"
"vitess.io/vitess/go/vt/servenv"
"vitess.io/vitess/go/vt/throttler"
"vitess.io/vitess/go/vt/topo/topoproto"
)

// These constants represent values for various config parameters.
Expand Down Expand Up @@ -138,8 +140,9 @@ func registerTabletEnvFlags(fs *pflag.FlagSet) {
fs.StringVar(&currentConfig.TwoPCCoordinatorAddress, "twopc_coordinator_address", defaultConfig.TwoPCCoordinatorAddress, "address of the (VTGate) process(es) that will be used to notify of abandoned transactions.")
SecondsVar(fs, &currentConfig.TwoPCAbandonAge, "twopc_abandon_age", defaultConfig.TwoPCAbandonAge, "time in seconds. Any unresolved transaction older than this time will be sent to the coordinator to be resolved.")
flagutil.DualFormatBoolVar(fs, &currentConfig.EnableTxThrottler, "enable_tx_throttler", defaultConfig.EnableTxThrottler, "If true replication-lag-based throttling on transactions will be enabled.")
flagutil.DualFormatStringVar(fs, &currentConfig.TxThrottlerConfig, "tx_throttler_config", defaultConfig.TxThrottlerConfig, "The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message")
flagutil.DualFormatStringVar(fs, &currentConfig.TxThrottlerConfig, "tx_throttler_config", defaultConfig.TxThrottlerConfig, "The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message.")
timvaillancourt marked this conversation as resolved.
Show resolved Hide resolved
flagutil.DualFormatStringListVar(fs, &currentConfig.TxThrottlerHealthCheckCells, "tx_throttler_healthcheck_cells", defaultConfig.TxThrottlerHealthCheckCells, "A comma-separated list of cells. Only tabletservers running in these cells will be monitored for replication lag by the transaction throttler.")
fs.Var((*topoproto.TabletTypeListFlag)(&currentConfig.TxThrottlerTabletTypes), "tx-throttler-tablet-types", "A comma-separated list of tablet types. Only tablets of this type are monitored for replication lag by the transaction throttler. Supported types are replica and/or rdonly.")
timvaillancourt marked this conversation as resolved.
Show resolved Hide resolved

fs.BoolVar(&enableHotRowProtection, "enable_hot_row_protection", false, "If true, incoming transactions for the same row (range) will be queued and cannot consume all txpool slots.")
fs.BoolVar(&enableHotRowProtectionDryRun, "enable_hot_row_protection_dry_run", false, "If true, hot row protection is not enforced but logs if transactions would have been queued.")
Expand Down Expand Up @@ -310,9 +313,10 @@ type TabletConfig struct {
TwoPCCoordinatorAddress string `json:"-"`
TwoPCAbandonAge Seconds `json:"-"`

EnableTxThrottler bool `json:"-"`
TxThrottlerConfig string `json:"-"`
TxThrottlerHealthCheckCells []string `json:"-"`
EnableTxThrottler bool `json:"-"`
TxThrottlerConfig string `json:"-"`
TxThrottlerHealthCheckCells []string `json:"-"`
TxThrottlerTabletTypes []topodatapb.TabletType `json:"-"`

EnableLagThrottler bool `json:"-"`
EnableTableGC bool `json:"-"` // can be turned off programmatically by tests
Expand Down Expand Up @@ -561,6 +565,7 @@ var defaultConfig = TabletConfig{
EnableTxThrottler: false,
TxThrottlerConfig: defaultTxThrottlerConfig(),
TxThrottlerHealthCheckCells: []string{},
TxThrottlerTabletTypes: []topodatapb.TabletType{topodatapb.TabletType_REPLICA},

EnableLagThrottler: false, // Feature flag; to switch to 'true' at some stage in the future

Expand Down
32 changes: 22 additions & 10 deletions go/vt/vttablet/tabletserver/txthrottler/tx_throttler.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,15 @@ func tryCreateTxThrottler(config *tabletenv.TabletConfig, topoServer *topo.Serve
return newTxThrottler(&txThrottlerConfig{enabled: false})
}

for _, tabletType := range config.TxThrottlerTabletTypes {
switch tabletType {
case topodatapb.TabletType_REPLICA, topodatapb.TabletType_RDONLY:
continue
default:
return nil, fmt.Errorf("%q tablet type is not supported", tabletType)
timvaillancourt marked this conversation as resolved.
Show resolved Hide resolved
}
}

var throttlerConfig throttlerdatapb.Configuration
if err := prototext.Unmarshal([]byte(config.TxThrottlerConfig), &throttlerConfig); err != nil {
return nil, err
Expand All @@ -122,6 +131,7 @@ func tryCreateTxThrottler(config *tabletenv.TabletConfig, topoServer *topo.Serve
return newTxThrottler(&txThrottlerConfig{
enabled: true,
topoServer: topoServer,
tabletConfig: config,
throttlerConfig: &throttlerConfig,
healthCheckCells: healthCheckCells,
})
Expand All @@ -136,6 +146,7 @@ type txThrottlerConfig struct {
enabled bool

topoServer *topo.Server
tabletConfig *tabletenv.TabletConfig
throttlerConfig *throttlerdatapb.Configuration
// healthCheckCells stores the cell names in which running vttablets will be monitored for
// replication lag.
Expand Down Expand Up @@ -166,6 +177,8 @@ type TopologyWatcherInterface interface {

// txThrottlerState holds the state of an open TxThrottler object.
type txThrottlerState struct {
tabletConfig *tabletenv.TabletConfig

// throttleMu serializes calls to throttler.Throttler.Throttle(threadId).
// That method is required to be called in serial for each threadId.
throttleMu sync.Mutex
Expand Down Expand Up @@ -283,7 +296,8 @@ func newTxThrottlerState(config *txThrottlerConfig, keyspace, shard, cell string
return nil, err
}
result := &txThrottlerState{
throttler: t,
tabletConfig: config.tabletConfig,
throttler: t,
}
createTxThrottlerHealthCheck(config, result, cell)

Expand Down Expand Up @@ -351,14 +365,12 @@ func (ts *txThrottlerState) deallocateResources() {

// StatsUpdate updates the health of a tablet with the given healthcheck.
func (ts *txThrottlerState) StatsUpdate(tabletStats *discovery.TabletHealth) {
// Ignore PRIMARY and RDONLY stats.
// We currently do not monitor RDONLY tablets for replication lag. RDONLY tablets are not
// candidates for becoming primary during failover, and it's acceptable to serve somewhat
// stale date from these.
// TODO(erez): If this becomes necessary, we can add a configuration option that would
// determine whether we consider RDONLY tablets here, as well.
if tabletStats.Target.TabletType != topodatapb.TabletType_REPLICA {
return
// Monitor tablets for replication lag if they have a tablet
// type specified by the --tx_throttler_tablet_types flag.
for _, expectedTabletType := range ts.tabletConfig.TxThrottlerTabletTypes {
if tabletStats.Target.TabletType == expectedTabletType {
ts.throttler.RecordReplicationLag(time.Now(), tabletStats)
return
}
}
timvaillancourt marked this conversation as resolved.
Show resolved Hide resolved
ts.throttler.RecordReplicationLag(time.Now(), tabletStats)
}
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ func TestEnabledThrottler(t *testing.T) {
config := tabletenv.NewDefaultConfig()
config.EnableTxThrottler = true
config.TxThrottlerHealthCheckCells = []string{"cell1", "cell2"}
config.TxThrottlerTabletTypes = []topodatapb.TabletType{topodatapb.TabletType_REPLICA}

throttler, err := tryCreateTxThrottler(config, ts)
if err != nil {
Expand Down