(wr *Wrangler) ShardReplicationStatuses hangs forever #4572

derekperkins · 2019-01-30T18:33:33Z

I'm indirectly using ShardReplicationStatuses via BackupShard, and I found a case where it hangs forever. I spun up some rdonly tablets temporarily, but when I deleted them, they didn't remove their tablet records. That left references with bad host names. I would expect for the call to either return an error that it couldn't contact one of the tablets, but instead, it hangs there until the context timeout hits, which is a couple hours by default for the backup call.

vitess/go/vt/wrangler/reparent.go

Lines 78 to 90 in 744b8e2

    
           		wg.Add(1) 
        
           		go func(i int, ti *topo.TabletInfo) { 
        
           			defer wg.Done() 
        
           			status, err := wr.tmc.SlaveStatus(ctx, ti.Tablet) 
        
           			if err != nil { 
        
           				rec.RecordError(fmt.Errorf("SlaveStatus(%v) failed: %v", ti.AliasString(), err)) 
        
           				return 
        
           			} 
        
           			result[i] = status 
        
           		}(i, ti) 
        
           	} 
        
           } 
        
           wg.Wait()

It looks like the actual backup is timing out, which led me to set the action_timeout to 6 hours, which made it harder to diagnose the problem. It would have been much nicer to see an error about not being able to connect to a tablet to get the replication status. I haven't dug deeper yet into the SlaveStatus interface call to see where that is set.

I'm not sure if tweaking the behavior of that call is appropriate or if it will have more cascading effects elsewhere that expect it to wait forever.

The text was updated successfully, but these errors were encountered:

ajm188 · 2022-06-24T14:04:20Z

This is a dupe of #4073, and I fixed this in #7690, so going to close!

derekperkins added the Type: Bug label Jan 30, 2019

ajm188 added the Component: Cluster management label Jun 24, 2022

ajm188 closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(wr *Wrangler) ShardReplicationStatuses hangs forever #4572

(wr *Wrangler) ShardReplicationStatuses hangs forever #4572

derekperkins commented Jan 30, 2019

ajm188 commented Jun 24, 2022

(wr *Wrangler) ShardReplicationStatuses hangs forever #4572

(wr *Wrangler) ShardReplicationStatuses hangs forever #4572

Comments

derekperkins commented Jan 30, 2019

ajm188 commented Jun 24, 2022