BR failed to backup stats #679

3pointer · 2021-01-06T02:15:37Z

Please answer these questions before submitting your issue. Thanks!

What did you do?
If possible, provide a recipe for reproducing the error.
Upgrade TiDB cluster from v4.0.0 to v4.0.9. then use BR v4.0.9 to backup.
What did you expect to see?
backup success.
What did you see instead?

[ERROR] [backup.go:25] ["failed to backup"] [error="[types:1406]Data Too Long, field len 5, data len 6"] [errorVerbose="[types:1406]Data Too Long, field len 5, data len 6\n
github.com/pingcap/errors.AddStack\n
\t	github.com/pingcap/[email protected]/errors.go:174\n
github.com/pingcap/errors.(*Error).GenWithStack\n
\t	github.com/pingcap/[email protected]/normalize.go:147\n
github.com/pingcap/tidb/types.ProduceStrWithSpecifiedTp\n
\t	github.com/pingcap/[email protected]/types/datum.go:995\n
github.com/pingcap/tidb/types.(*Datum).convertToString\n
\t	github.com/pingcap/[email protected]/types/datum.go:942\n
github.com/pingcap/tidb/types.(*Datum).ConvertTo\n
\t	github.com/pingcap/[email protected]/types/datum.go:821\n
github.com/pingcap/tidb/statistics/handle.(*Handle).histogramFromStorage\n
\t	github.com/pingcap/[email protected]/statistics/handle/handle.go:667\n
github.com/pingcap/tidb/statistics/handle.(*Handle).columnStatsFromStorage\n
\t	github.com/pingcap/[email protected]/statistics/handle/handle.go:460\n
github.com/pingcap/tidb/statistics/handle.(*Handle).tableStatsFromStorage\n
\t	github.com/pingcap/[email protected]/statistics/handle/handle.go:539\n
github.com/pingcap/tidb/statistics/handle.(*Handle).tableStatsToJSON\n
\t	github.com/pingcap/[email protected]/statistics/handle/dump.go:89\n
github.com/pingcap/tidb/statistics/handle.(*Handle).DumpStatsToJSON\n
\t	github.com/pingcap/[email protected]/statistics/handle/dump.go:76\n
github.com/pingcap/br/pkg/backup.BuildBackupRangeAndSchema\n
\t	github.com/pingcap/br@/pkg/backup/client.go:340\n
github.com/pingcap/br/pkg/task.RunBackup\n
\t	github.com/pingcap/br@/pkg/task/backup.go:257\n
github.com/pingcap/br/cmd.runBackupCommand\n
\t	github.com/pingcap/br@/cmd/backup.go:24\n
github.com/pingcap/br/cmd.newFullBackupCommand.func1\n
\t	github.com/pingcap/br@/cmd/backup.go:85\n
github.com/spf13/cobra.(*Command).execute\n
\t	github.com/spf13/[email protected]/command.go:842\n
github.com/spf13/cobra.(*Command).ExecuteC\n
\t	github.com/spf13/[email protected]/command.go:950\n
github.com/spf13/cobra.(*Command).Execute\n
\t	github.com/spf13/[email protected]/command.go:887\n
main.main\n
\t	github.com/pingcap/br@/main.go:58\n
runtime.main\n
\t	runtime/proc.go:203\n
runtime.goexit\n
\t	runtime/asm_amd64.s:1357"] [

What version of BR and TiDB/TiKV/PD are you using?
BR v4.0.9
TiDB v4.0.9 (upgrade from v4.0.0)

The text was updated successfully, but these errors were encountered:

3pointer · 2021-01-06T02:20:35Z

The failed reason is DumpsStatsToJson is not compatible when TiDB is upgraded from v4.0.0.
To workaround, user can add --ignore-stats=true to command to skip backup stats. and we need find a better way to backup stats and consider with compatibility.

kennytm · 2021-01-06T09:46:57Z

This is a bug in TiDB. Still, such failure probably shouldn't abort backup/restore since stats aren't essential.

also, the dump stats error should log which table is the cause.

br/pkg/backup/client.go

Lines 348 to 358 in 458d542

    
           var stats []byte 
        
           if !ignoreStats { 
        
           	jsonTable, err := h.DumpStatsToJSON(dbInfo.Name.String(), tableInfo, nil) 
        
           	if err != nil { 
        
           		return nil, nil, errors.Trace(err) 
        
           	} 
        
           	stats, err = json.Marshal(jsonTable) 
        
           	if err != nil { 
        
           		return nil, nil, errors.Trace(err) 
        
           	} 
        
           }

IANTHEREAL · 2021-01-06T10:02:54Z

What is the trigger mechanism of dumpStats failure?

stats shouldn't cause the backup to fail. If the stats can be backup&restored, it is good. If BR can't back up stats, it needs to output a backup summary log indicating that backup stats failed; and if there is no stats when restoring, then BR needs to be able to support running analyze table

stats information is stored to backupmeta. what If the entire cluster is backup in a large-scale cluster? how to deal with the large size backupmeta, is it sense?

kennytm · 2021-01-06T10:12:22Z

if the stats field is nil during restore, it will just be skipped.

br/pkg/restore/client.go

Lines 816 to 826 in 458d542

    
           if table.Stats != nil { 
        
           	log.Info("start loads analyze after validate checksum", 
        
           		zap.Stringer("db name", tbl.OldTable.DB.Name), 
        
           		zap.Stringer("name", tbl.OldTable.Info.Name), 
        
           		zap.Int64("old id", tbl.OldTable.Info.ID), 
        
           		zap.Int64("new id", tbl.Table.ID), 
        
           	) 
        
           	if err := rc.statsHandler.LoadStatsFromJSON(rc.dom.InfoSchema(), table.Stats); err != nil { 
        
           		log.Error("analyze table failed", zap.Any("table", table.Stats), zap.Error(err)) 
        
           	} 
        
           }

missing stats won't prevent user accessing the data, just that the queries are not optimized. ANALYZE TABLE can be performed separately.

IANTHEREAL · 2021-01-06T13:02:03Z

OK, it can be a workaround method.

For a better user experience, restore needs to support analyze table for tables without stats. Many users may not know analyze table, and restore should be a complete operation, the cluster should be in a good state after restoring

kennytm · 2021-01-07T07:22:28Z

running "analyze table" requires scanning the tables again though, which is slow (esp. when restoring from an old backup where the stats are all missing)

IANTHEREAL · 2021-01-07T09:05:02Z

This is true, but analyze table is necessary. So we should make sure that the backup statistics need to be successful in major situations

IANTHEREAL · 2021-01-07T13:11:22Z

As discussed with @kennytm, some consensuses are as follows

the most important thing is that BR needs to ensure the success rate of backup stats to ensure that most of the backup statistics can be backed up successfully, the success rate can reach 99.9%
table restored without stats can be considered unavailable, and requires human or restore program intervention
ignore-stats option of backup is mainly for testing. If someone does backup with ignore-stats， restore does not need to execute analyze table automatically
BR restore should provide an option skip-analyze whose default values is false
the backup that is from tidb (version < 4.0.9) can skip analyze table
If there are table backups that do not contain stats information, the user should be reminded that restore would take a long time
If restore fails to execute analyze table statement, the log will prompt the user to manually execute analyze table, and continue to restore other tables.

kennytm · 2021-01-07T13:41:56Z

my preference is:

4. the option should be --analyze (default is true) which could be set to false for those who really can't wait
5. no need to do version check. we do analyze even if restoring pre-4.0.9 backup archives
8. we should clearly document this analyze behavior.

IANTHEREAL · 2021-01-07T14:31:34Z

I have no problem with these preferences ⏩

For version <4.0.9, I have no special requirements. In order to restore quickly, we can allow the restore of version less than 4.0.9 not to execute analyze table, you can choose it. But I don’t want there are many stats backup failures after version 4.0.9 to avoid too long restore time

IANTHEREAL · 2021-01-18T02:23:47Z

I think we made some wrong decisions before, based on the inertia to solve the current problem. I have a new proposal. The product factors considered are as follows:
One is that users do not need to know statistical information at all;
the other is that we should ensure the success rate of statistical information backup and restoration.

I suggest not to introduce more complicated logic. Because ignore stats already exists, we can temporarily shelve it, but it is recommended not to introduce skip analyze, especially in the future we do not want this parameter in the product, it is only the temporary solution. What we really need is a fast, highly successful statistical information backup and recovery program, unless we can't do it.

The recommended processing logic is as follows

If the backup statistics fails, the backup fails
If the restoration of statistical information fails, the restoration fails, but it can prompt the user that the data has been successfully restored, and you can manually execute analyze table; even support the separate restoration of statistical information

@kennytm @3pointer

kennytm · 2021-01-19T04:22:12Z

So we'll be changing how we backup stats, and also support system tables like RBAC info and global variables.

We will backup all relevant mysql.* tables via normal SST backup. This includes user information, statistics, etc. The schema is first renamed to tidb_br_temp_mysql.*.
Since mysql.* tables are treated like normal backup, it should never trigger processing logic (1) as they are now equal.
After restoring the tidb_br_temp_mysql.*, we will use the SQL REPLACE INTO mysql.xxx SELECT ... FROM tidb_br_temp_mysql.xxx WHERE ... to insert the data back into the real mysql.* tables.
These SQL may fail but won't impact overall data as logic (2) suggests.

To further support logic (2), and support backward/forward compatibility,

BR will never execute ANALYZE.
We will add a flag --system-tables='minimal,privileges,statistics' to backup and restore to allow skipping part of system tables when they exist..
Version compatibility:
- 4.0.8 backup → New restore: If the system table is completely not found in backup archive but --system-tables specifies it, a non-fatal warning is issued about what should be done. For --system-tables=statistics specifically, we will list all selected tables for restore as potential targets of ANALYZE.
- 4.0.9 backup → New restore: The Stats field will just be ignored. BR will ask user to do ANALYZE manually. Also the --ignore-stats flag becomes no-op.
- New backup → 4.0.9 restore: We must never backup the system tables with the name mysql.* directly, since this will corrupt older clusters who don't understand how to work with them. Therefore, during backup we must already rename mysql.* to tidb_br_temp_mysql.*.
Feature compatibility:
- We haven't investigated if incremental backup can handle system tables. There are two things to consider:
  1. the incremental backed up system tables do not contain complete information.
  2. if a version upgrade happened between last-backupts and backupts, there may be DDLs operating on mysql.* specifically. They should either be skipped (current situation) or renamed.
- Raw backup/restore is simply irrelevant.
- Online restore - The restore SST part is the same as normal restore. The execute SQL part will be done directly in the BR node. We may use REPLACE LOW_PRIORITY INTO … if it does make any difference. Each group of SQL execution should happen inside a transaction. There may be some problem when the system table is too large to run in a single transaction. That is, without large transaction support, we may have to split those REPLACE statemeents into multiple self-consistent batches, and this could complicate the logic quite a lot.

SunRunAway · 2021-04-21T00:57:41Z

mysql.user may have the account information which customer may forget in their brain.
TiDB allow a customer to create a table in the mysql schema.

I suggest we should backup the system tables, and can opt out when restoring.

@kennytm @IANTHEREAL

SunRunAway · 2021-04-21T00:58:10Z

The second reason is very important and critical.

As a matter of principle, we should also back up everything and restore selectively, instead of backuping selectively.

3pointer added the type/bug Something isn't working label Jan 6, 2021

marsishandsome mentioned this issue Jan 6, 2021

backup failure: types:1406 Data Too Long, field len 5, data len 6 #680

Closed

overvenus added component/backup component/restore type/compatibility labels Jan 6, 2021

kennytm mentioned this issue Jan 8, 2021

backup: dump stats to json should not be fatal #682

Merged

jebter added the severity/major label Jan 8, 2021

This was referenced Jan 11, 2021

backup: dump stats to json should not be fatal (#682) #686

Merged

backup: dump stats to json should not be fatal (#682) #687

Merged

kennytm mentioned this issue Jan 12, 2021

Statistics should be saved outside of backupmeta #691

Closed

kennytm mentioned this issue Mar 15, 2021

br can not backup and resotore tidb system databases like mysql.user #872

Closed

kennytm mentioned this issue Apr 13, 2021

The data is restored to the new database, and the schema information is not completely updated. #1002

Closed

YuJuncen mentioned this issue Apr 22, 2021

backup,restore: support backing up / restore system databases #1048

Merged

This was referenced Apr 30, 2021

backup,restore: support backing up / restore system databases (#1048) #1077

Merged

backup,restore: support backing up / restore system databases (#1048) #1078

Merged

kennytm mentioned this issue Nov 9, 2021

Improve backup and restore of statistics pingcap/tidb#29628

Open

VelocityLight added the affects-6.0 label Mar 17, 2022

VelocityLight added the affects-6.1 label May 20, 2022

VelocityLight added the affects-6.2 label Jul 20, 2022

VelocityLight added the affects-6.3 label Sep 20, 2022

VelocityLight added the affects-6.4 label Nov 4, 2022

jebter removed the affects-6.4 label Nov 4, 2022

VelocityLight added the affects-6.5 label Dec 2, 2022

jebter removed affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.5 labels Dec 5, 2022

VelocityLight added affects-5.1 and removed affects-5.1 labels Dec 6, 2022

VelocityLight added the affects-6.6 label Feb 9, 2023

VelocityLight added the affects-7.0 label Mar 20, 2023

VelocityLight added the affects-7.1 label Apr 20, 2023

ti-chi-bot added the affects-7.5 label Oct 23, 2023

3pointer closed this as completed Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BR failed to backup stats #679

BR failed to backup stats #679

3pointer commented Jan 6, 2021 •

edited by kennytm

Loading

3pointer commented Jan 6, 2021

kennytm commented Jan 6, 2021

IANTHEREAL commented Jan 6, 2021 •

edited

Loading

kennytm commented Jan 6, 2021

IANTHEREAL commented Jan 6, 2021

kennytm commented Jan 7, 2021 •

edited

Loading

IANTHEREAL commented Jan 7, 2021

IANTHEREAL commented Jan 7, 2021 •

edited by kennytm

Loading

kennytm commented Jan 7, 2021

IANTHEREAL commented Jan 7, 2021

IANTHEREAL commented Jan 18, 2021 •

edited

Loading

kennytm commented Jan 19, 2021

SunRunAway commented Apr 21, 2021

SunRunAway commented Apr 21, 2021

BR failed to backup stats #679

BR failed to backup stats #679

Comments

3pointer commented Jan 6, 2021 • edited by kennytm Loading

3pointer commented Jan 6, 2021

kennytm commented Jan 6, 2021

IANTHEREAL commented Jan 6, 2021 • edited Loading

kennytm commented Jan 6, 2021

IANTHEREAL commented Jan 6, 2021

kennytm commented Jan 7, 2021 • edited Loading

IANTHEREAL commented Jan 7, 2021

IANTHEREAL commented Jan 7, 2021 • edited by kennytm Loading

kennytm commented Jan 7, 2021

IANTHEREAL commented Jan 7, 2021

IANTHEREAL commented Jan 18, 2021 • edited Loading

kennytm commented Jan 19, 2021

SunRunAway commented Apr 21, 2021

SunRunAway commented Apr 21, 2021

3pointer commented Jan 6, 2021 •

edited by kennytm

Loading

IANTHEREAL commented Jan 6, 2021 •

edited

Loading

kennytm commented Jan 7, 2021 •

edited

Loading

IANTHEREAL commented Jan 7, 2021 •

edited by kennytm

Loading

IANTHEREAL commented Jan 18, 2021 •

edited

Loading