Skip to content

rebalance command

Jamie Alquiza edited this page Nov 14, 2018 · 22 revisions

Rebalance

rebalance is used for:

  • targeted broker storage rebalancing*
  • incremental scaling

*In contrast to storage rebalancing in rebuild (which requires that 100% of partitions for a targeted topic are relocated), rebalance is used for partial partition rebalancing from most to least storage utilized brokers.

Rebalance takes an input topic list (similarly to rebuild: comma delimited with regex support) and a broker list. Typically the broker list would include all brokers that the target topics(s) currently occupy. Removing brokers is not allowed in rebalance; only adding additional, new brokers is permitted.

Rebalance uses the same broker/topic metrics mechanism as rebuild (both of which can be supplemented with metricsfetcher). Rebalance works by examining the free storage utilization on all referenced brokers and selecting those that are more than 20% below the harmonic mean (configurable via the --storage-threshold parameter). Alternatively, brokers below a free storage in gigabytes can be targeted using the --storage-threshold-gb parameter. For each broker targeted for partition offloading, partitions are planned for relocation to the least-utilized destination. Relocations can be scoped by rack.id via the --locality-scoped flag. For instance, if rack.id values reflected physical data centers, performing a rebalance with a locality scope would rebalance partitions among brokers per each data center in isolation.

Destination broker suitability is determined as either:

  • (locality scoped) the least utilized broker with the same rack.id as the offload target
  • (non locality scoped) the least utilized broker that wouldn't result in duplicate rack.id values in the resulting ISR

The --tolerance flag specifies specifies limits on how much data can be moved from offload targets and to destination targets as a distance (in percent) from the storage free arithmetic mean. If using the default 10% and a mean storage free of 800GB, partition movement planning per target will stop when:

  • the target free storage would exceed 880GB (mean+10%)
  • any partition movement would push the most suitable destination below 720GB (mean-10%)

All partition movement planning halts when all offload targets have no possible relocations to schedule. An plan result and partition map are then printed out.

Example

Fetching up-to-date metrics data with metricsfetcher:

$ metricsfetcher --broker-storage-query "avg:system.disk.free{cluster:kafka-test,device:/data}" --partition-size-query "max:kafka.log.partition.size{cluster:kafka-test} by {topic,partition}"
Submitting max:kafka.log.partition.size{cluster:kafka-test} by {topic,partition}.rollup(avg, 3600)
success
Submitting avg:system.disk.free{cluster:kafka-test,device:/data} by {broker_id}.rollup(avg, 3600)
success

Data written to ZooKeeper

Running rebuild for "test-topic" and providing all of the brokers "test-topic" partitions reside on:

$ topicmappr rebalance --topics "test-topic" --brokers 1200,1201,1202,1203,1205,1208,1209,1211,1212,1213,1214,1215,1216,1217,12
20,1223,1224,1225,1234,1235,1236,1247,1254,1255,1256,1267,1376 --storage-threshold 0.05 --tolerance 0.2

Topics:
  test-topic

Validating broker list:
  OK

Rebalance parameters:
  Free storage mean, harmonic mean: 2299.03GB, 2199.97GB
  Broker free storage limits (with a 20.00% tolerance from mean):
    Sources limited to <= 2758.83GB
    Destinations limited to >= 1839.22GB

Brokers targeted for partition offloading (>= 5.00% threshold below hmean):
  1203
  1209
  1211
  1212
  1214
  1217
  1224
  1225
  1247
  1255
  1256
  1376

Broker 1203 relocations planned:
    [800.20GB] test-topic p117 -> 1200

Broker 1209 relocations planned:
    [827.74GB] test-topic p119 -> 1235

Broker 1211 relocations planned:
    [602.12GB] test-topic p125 -> 1236

Broker 1212 relocations planned:
    [825.81GB] test-topic p22 -> 1208

Broker 1214 relocations planned:
    [678.96GB] test-topic p59 -> 1213
    [510.32GB] test-topic p37 -> 1213

Broker 1217 relocations planned:
  [none]

Broker 1224 relocations planned:
    [692.60GB] test-topic p118 -> 1220

Broker 1225 relocations planned:
    [255.21GB] test-topic p75 -> 1216

Broker 1247 relocations planned:
  [none]

Broker 1255 relocations planned:
    [660.11GB] test-topic p20 -> 1235

Broker 1256 relocations planned:
  [none]

  Broker 1376 relocations planned:
  [none]

Partition map changes:
  test-topic p0: [1255 1217] -> [1255 1217] no-op
  test-topic p1: [1267 1205] -> [1267 1205] no-op
  test-topic p2: [1213 1256] -> [1213 1256] no-op
  test-topic p3: [1205 1236] -> [1205 1236] no-op
  test-topic p4: [1256 1202] -> [1256 1202] no-op
  test-topic p5: [1223 1211] -> [1223 1211] no-op
  test-topic p6: [1247 1212] -> [1247 1212] no-op
  test-topic p7: [1256 1202] -> [1256 1202] no-op
  test-topic p8: [1215 1234] -> [1215 1234] no-op
  test-topic p9: [1220 1235] -> [1220 1235] no-op
  test-topic p10: [1217 1223] -> [1217 1223] no-op
  test-topic p11: [1212 1225] -> [1212 1225] no-op
  test-topic p12: [1223 1254] -> [1223 1254] no-op
  test-topic p13: [1220 1235] -> [1220 1235] no-op
  test-topic p14: [1211 1223] -> [1211 1223] no-op
  test-topic p15: [1225 1209] -> [1225 1209] no-op
  test-topic p16: [1202 1215] -> [1202 1215] no-op
  test-topic p17: [1376 1255] -> [1376 1255] no-op
  test-topic p18: [1201 1267] -> [1201 1267] no-op
  test-topic p19: [1211 1223] -> [1211 1223] no-op
  test-topic p20: [1255 1203] -> [1235 1203] replaced broker
  test-topic p21: [1254 1225] -> [1254 1225] no-op
  test-topic p22: [1211 1212] -> [1211 1208] replaced broker
  test-topic p23: [1209 1256] -> [1209 1256] no-op
  test-topic p24: [1208 1220] -> [1208 1220] no-op
  test-topic p25: [1215 1234] -> [1215 1234] no-op
  test-topic p26: [1217 1234] -> [1217 1234] no-op
  test-topic p27: [1236 1223] -> [1236 1223] no-op
  test-topic p28: [1203 1247] -> [1203 1247] no-op
  test-topic p29: [1214 1217] -> [1214 1217] no-op
  test-topic p30: [1217 1214] -> [1217 1214] no-op
  test-topic p31: [1217 1201] -> [1217 1201] no-op
  test-topic p32: [1203 1247] -> [1203 1247] no-op
  test-topic p33: [1224 1216] -> [1224 1216] no-op
  test-topic p34: [1209 1256] -> [1209 1256] no-op
  test-topic p35: [1224 1223] -> [1224 1223] no-op
  test-topic p36: [1225 1209] -> [1225 1209] no-op
  test-topic p37: [1217 1214] -> [1217 1213] replaced broker
  test-topic p38: [1256 1202] -> [1256 1202] no-op
  test-topic p39: [1267 1205] -> [1267 1205] no-op
  test-topic p40: [1224 1216] -> [1224 1216] no-op
  test-topic p41: [1201 1267] -> [1201 1267] no-op
  test-topic p42: [1255 1225] -> [1255 1225] no-op
  test-topic p43: [1213 1256] -> [1213 1256] no-op
  test-topic p44: [1220 1235] -> [1220 1235] no-op
  test-topic p45: [1201 1267] -> [1201 1267] no-op
  test-topic p46: [1203 1247] -> [1203 1247] no-op
  test-topic p47: [1234 1376] -> [1234 1376] no-op
  test-topic p48: [1376 1203] -> [1376 1203] no-op
  test-topic p49: [1267 1205] -> [1267 1205] no-op
  test-topic p50: [1247 1224] -> [1247 1224] no-op
  test-topic p51: [1212 1201] -> [1212 1201] no-op
  test-topic p52: [1254 1217] -> [1254 1217] no-op
  test-topic p53: [1211 1208] -> [1211 1208] no-op
  test-topic p54: [1209 1224] -> [1209 1224] no-op
  test-topic p55: [1205 1236] -> [1205 1236] no-op
  test-topic p56: [1213 1256] -> [1213 1256] no-op
  test-topic p57: [1235 1200] -> [1235 1200] no-op
  test-topic p58: [1212 1201] -> [1212 1201] no-op
  test-topic p59: [1236 1214] -> [1236 1213] replaced broker
  test-topic p60: [1255 1203] -> [1255 1203] no-op
  test-topic p61: [1209 1215] -> [1209 1215] no-op
  test-topic p62: [1247 1224] -> [1247 1224] no-op
  test-topic p63: [1224 1255] -> [1224 1255] no-op
  test-topic p64: [1214 1225] -> [1214 1225] no-op
  test-topic p65: [1212 1211] -> [1212 1211] no-op
  test-topic p66: [1214 1211] -> [1214 1211] no-op
  test-topic p67: [1200 1213] -> [1200 1213] no-op
  test-topic p68: [1211 1208] -> [1211 1208] no-op
  test-topic p69: [1215 1203] -> [1215 1203] no-op
  test-topic p70: [1254 1216] -> [1254 1216] no-op
  test-topic p71: [1202 1215] -> [1202 1215] no-op
  test-topic p72: [1236 1254] -> [1236 1254] no-op
  test-topic p73: [1220 1235] -> [1220 1235] no-op
  test-topic p74: [1247 1212] -> [1247 1212] no-op
  test-topic p75: [1225 1209] -> [1216 1209] replaced broker
  test-topic p76: [1215 1234] -> [1215 1234] no-op
  test-topic p77: [1216 1255] -> [1216 1255] no-op
  test-topic p78: [1205 1236] -> [1205 1236] no-op
  test-topic p79: [1208 1220] -> [1208 1220] no-op
  test-topic p80: [1234 1376] -> [1234 1376] no-op
  test-topic p81: [1376 1208] -> [1376 1208] no-op
  test-topic p82: [1234 1376] -> [1234 1376] no-op
  test-topic p83: [1223 1234] -> [1223 1234] no-op
  test-topic p84: [1256 1202] -> [1256 1202] no-op
  test-topic p85: [1216 1203] -> [1216 1203] no-op
  test-topic p86: [1202 1216] -> [1202 1216] no-op
  test-topic p87: [1254 1217] -> [1254 1217] no-op
  test-topic p88: [1234 1376] -> [1234 1376] no-op
  test-topic p89: [1223 1254] -> [1223 1254] no-op
  test-topic p90: [1216 1214] -> [1216 1214] no-op
  test-topic p91: [1202 1215] -> [1202 1215] no-op
  test-topic p92: [1267 1205] -> [1267 1205] no-op
  test-topic p93: [1200 1213] -> [1200 1213] no-op
  test-topic p94: [1223 1254] -> [1223 1254] no-op
  test-topic p95: [1208 1220] -> [1208 1220] no-op
  test-topic p96: [1225 1209] -> [1225 1209] no-op
  test-topic p97: [1235 1200] -> [1235 1200] no-op
  test-topic p98: [1200 1213] -> [1200 1213] no-op
  test-topic p99: [1203 1247] -> [1203 1247] no-op
  test-topic p100: [1267 1205] -> [1267 1205] no-op
  test-topic p101: [1220 1235] -> [1220 1235] no-op
  test-topic p102: [1216 1255] -> [1216 1255] no-op
  test-topic p103: [1376 1214] -> [1376 1214] no-op
  test-topic p104: [1202 1215] -> [1202 1215] no-op
  test-topic p105: [1209 1224] -> [1209 1224] no-op
  test-topic p106: [1255 1225] -> [1255 1225] no-op
  test-topic p107: [1205 1236] -> [1205 1236] no-op
  test-topic p108: [1235 1200] -> [1235 1200] no-op
  test-topic p109: [1200 1213] -> [1200 1213] no-op
  test-topic p110: [1254 1255] -> [1254 1255] no-op
  test-topic p111: [1213 1201] -> [1213 1201] no-op
  test-topic p112: [1236 1208] -> [1236 1208] no-op
  test-topic p113: [1224 1216] -> [1224 1216] no-op
  test-topic p114: [1256 1202] -> [1256 1202] no-op
  test-topic p115: [1201 1267] -> [1201 1267] no-op
  test-topic p116: [1205 1236] -> [1205 1236] no-op
  test-topic p117: [1203 1247] -> [1200 1247] replaced broker
  test-topic p118: [1247 1224] -> [1247 1220] replaced broker
  test-topic p119: [1225 1209] -> [1225 1235] replaced broker
  test-topic p120: [1376 1212] -> [1376 1212] no-op
  test-topic p121: [1234 1376] -> [1234 1376] no-op
  test-topic p122: [1208 1220] -> [1208 1220] no-op
  test-topic p123: [1214 1217] -> [1214 1217] no-op
  test-topic p124: [1215 1212] -> [1215 1212] no-op
  test-topic p125: [1212 1211] -> [1212 1236] replaced broker
  test-topic p126: [1214 1211] -> [1214 1211] no-op
  test-topic p127: [1216 1254] -> [1216 1254] no-op

Broker distribution:
  degree [min/max/avg]: 2/7/4.30 -> 2/7/4.81
  -
  Broker 1200 - leader: 5, follower: 3, total: 8
  Broker 1201 - leader: 4, follower: 4, total: 8
  Broker 1202 - leader: 5, follower: 5, total: 10
  Broker 1203 - leader: 4, follower: 5, total: 9
  Broker 1205 - leader: 5, follower: 5, total: 10
  Broker 1208 - leader: 4, follower: 5, total: 9
  Broker 1209 - leader: 5, follower: 4, total: 9
  Broker 1211 - leader: 5, follower: 4, total: 9
  Broker 1212 - leader: 5, follower: 4, total: 9
  Broker 1213 - leader: 4, follower: 6, total: 10
  Broker 1214 - leader: 5, follower: 3, total: 8
  Broker 1215 - leader: 5, follower: 5, total: 10
  Broker 1216 - leader: 6, follower: 5, total: 11
  Broker 1217 - leader: 5, follower: 5, total: 10
  Broker 1220 - leader: 5, follower: 5, total: 10
  Broker 1223 - leader: 5, follower: 5, total: 10
  Broker 1224 - leader: 5, follower: 4, total: 9
  Broker 1225 - leader: 4, follower: 5, total: 9
  Broker 1234 - leader: 5, follower: 5, total: 10
  Broker 1235 - leader: 4, follower: 6, total: 10
  Broker 1236 - leader: 4, follower: 6, total: 10
  Broker 1247 - leader: 5, follower: 5, total: 10
  Broker 1254 - leader: 5, follower: 5, total: 10
  Broker 1255 - leader: 4, follower: 5, total: 9
  Broker 1256 - leader: 5, follower: 5, total: 10
  Broker 1267 - leader: 5, follower: 4, total: 9
  Broker 1376 - leader: 5, follower: 5, total: 10

Storage free change estimations:
  range: 2031.15GB -> 971.02GB
  range spread: 130.47% -> 53.45%
  std. deviation: 521.41GB -> 305.21GB
  -
  Broker 1200: 3587.97 -> 2787.77 (-800.20GB, -22.30%)
  Broker 1201: 2708.39 -> 2708.39 (+0.00GB, 0.00%)
  Broker 1202: 2209.01 -> 2209.01 (+0.00GB, 0.00%)
  Broker 1203: 1865.20 -> 2665.40 (+800.20GB, 42.90%)
  Broker 1205: 2120.30 -> 2120.30 (+0.00GB, 0.00%)
  Broker 1208: 3224.55 -> 2398.75 (-825.81GB, -25.61%)
  Broker 1209: 1912.19 -> 2739.93 (+827.74GB, 43.29%)
  Broker 1211: 1873.23 -> 2475.35 (+602.12GB, 32.14%)
  Broker 1212: 1916.88 -> 2742.69 (+825.81GB, 43.08%)
  Broker 1213: 3165.90 -> 1976.62 (-1189.28GB, -37.57%)
  Broker 1214: 1556.82 -> 2746.10 (+1189.28GB, 76.39%)
  Broker 1215: 2091.04 -> 2091.04 (+0.00GB, 0.00%)
  Broker 1216: 2150.41 -> 1895.21 (-255.21GB, -11.87%)
  Broker 1217: 1816.75 -> 1816.75 (+0.00GB, 0.00%)
  Broker 1220: 2877.80 -> 2185.20 (-692.60GB, -24.07%)
  Broker 1223: 2347.95 -> 2347.95 (+0.00GB, 0.00%)
  Broker 1224: 1977.97 -> 2670.58 (+692.60GB, 35.02%)
  Broker 1225: 1960.09 -> 2215.30 (+255.21GB, 13.02%)
  Broker 1234: 2109.06 -> 2109.06 (+0.00GB, 0.00%)
  Broker 1235: 3369.32 -> 1881.47 (-1487.85GB, -44.16%)
  Broker 1236: 2656.35 -> 2054.22 (-602.12GB, -22.67%)
  Broker 1247: 1956.20 -> 1956.20 (+0.00GB, 0.00%)
  Broker 1254: 2416.52 -> 2416.52 (+0.00GB, 0.00%)
  Broker 1255: 1850.83 -> 2510.94 (+660.11GB, 35.67%)
  Broker 1256: 1986.07 -> 1986.07 (+0.00GB, 0.00%)
  Broker 1267: 2301.33 -> 2301.33 (+0.00GB, 0.00%)
  Broker 1376: 2065.64 -> 2065.64 (+0.00GB, 0.00%)

New partition maps:
  test-topic.json

Results after applying test-topic.json (red bars indicate start, finish events from autothrottle):

Troubleshooting

Enabling --verbose will give per offload target, per partition placement decision information.

An offload target will not list any partitions scheduled for relocation:

  • It has few, large partitions and even the smallest one available would free up too much storage on the source or consume too much on any destination.
  • All partitions examined were too large to find an optimal relocation. Increasing the --partition-limit flag beyond the default of 30 increases the likelihood of finding a possible relocation (if the broker holds more than 30 partitions).
  • No suitable destination brokers have enough free storage. Possible actions:
    • adding additional brokers to the congested rack.id locality
    • disabling locality scoping (--locality-scoped=false)
    • relaxing the --tolerance (this may result in poor storage free range spread)

Storage utilization range isn't improving

The storage range is a key metric in improving storage balance. Sometimes this can be a result of offload targets being unable to schedule relocations (see above). In other cases, changing the --tolerance up or down in 0.02 increments can improve results. This could require trial and error because no single tolerance value (which sets source and destination broker high/low storage limits) is universally optimal. Factors such as partition counts, distribution, sizes, broker counts, replica locality and other constraints make this a difficult problem to optimize for.

Likewise, which brokers to target for offloading is an influencing factor. Larger --storage-threshold values (such as the default 20%) are intended to target outlier brokers. If balance is somewhat good to begin with, lower values (such as 5% in the example) can be used to target more brokers, which opens more opportunity for improved balance. At some point, it may be best to use the rebuild command with the storage placement functionality and just build a storage optimal map from scratch on a new set of target brokers.

Clone this wiki locally