Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of vreplication setState() failure #13488

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go/vt/vttablet/tabletmanager/vreplication/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ func (ct *controller) runBlp(ctx context.Context) (err error) {
if isUnrecoverableError(err) || !ct.lastWorkflowError.ShouldRetry() {
log.Errorf("vreplication stream %d going into error state due to %+v", ct.id, err)
if errSetState := vr.setState(binlogplayer.BlpError, err.Error()); errSetState != nil {
log.Errorf("INTERNAL: unable to setState() in controller. Attempting to set error text: [%v]; setState() error is: %v", err, errSetState)
return err // yes, err and not errSetState.
}
return nil // this will cause vreplicate to quit the workflow
Expand Down
2 changes: 2 additions & 0 deletions go/vt/vttablet/tabletmanager/vreplication/vreplicator.go
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,7 @@ func (vr *vreplicator) readSettings(ctx context.Context, dbClient *vdbClient) (s
}

func (vr *vreplicator) setMessage(message string) error {
message = strings.ToValidUTF8(message, "�")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.ToValidUTF8 replaces all non-UTF8 characters in message with , which is the same character you see on your terminal or in CI logs etc. for non-UTF8 characters. The main thing is to remove non-UTF8 characters, of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that strings in go were already utf8 encoded and that unicode character is used for invalid characters. Do you have any examples of what we're trying to fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the story is this: we've seen a production issue where a vitess Online DDL failed, and the error message was Picked source tablet..., which is really not an error message. Upon investigation, we found a log entry saying (details redacted):

E0712 14:51:18.837571       1 controller.go:268] vreplication stream 564 going into error state due to Column 'c3' cannot be null (errno 1048) (sqlstate 23000) during query: insert into _4e3e77b5_d1bc_5366_86d2_cdb474f41540_20230712145011_vrepl(c1, c2, c3, c4, c5)

Now, the underlying problem was that the migration converted a column from NULLable to NOT NULL. So far so good. The table data did in fact have NULL data. OK fine, too bad, but we do handle this scenario and know how to fail the migration. Which we did.

But then the problem was that calling setState (updating _vt.vreplication.message) failed. So we were unable to store the actual error message in Vitess's DB.

Why would that happen? I was reminded of this: #13487 ; when you attempt to write an arbitrary binary, that can fail. The error message contains specific table data, which could be binary. This is why in #13487 we converted non UTF8 characters to . I am unable to determine the precise data that caused this failure, but 13487 seems very relevant.

Moreover, Vitess did not print out the setState (i.e. the UPDATE _vt.vreplication) error. This PR therefore:

  1. Ensures to log exactly what we were trying to write, and what precise error we got back.
  2. Attempt to sanitize the input in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it doesn't hurt to do this. I would note that the field in the table is varbinary too. That may be a relevant factor here.

message = binlogplayer.MessageTruncate(message)
vr.stats.History.Add(&binlogplayer.StatsHistoryRecord{
Time: time.Now(),
Expand All @@ -456,6 +457,7 @@ func (vr *vreplicator) insertLog(typ, message string) error {
}

func (vr *vreplicator) setState(state, message string) error {
message = strings.ToValidUTF8(message, "�")
if message != "" {
vr.stats.History.Add(&binlogplayer.StatsHistoryRecord{
Time: time.Now(),
Expand Down