Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stats] persist schema hashes for purging #8674

Merged
merged 10 commits into from
Dec 13, 2024
Merged

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Dec 12, 2024

We previously would flush stats when an active server detected schema changes, but the same problem affects initializing a server's stats after offline table alters. The prototype here uses tags to persist schema hashes so that we detect differences between stored and present table changes.

Additional improvements to debug logging and error handling. Fixes a separate bug related to re-initializing database statistics after a drop within the same server lifetime.

Note: There are interruption points in-between detecting schema changes, deleting the schema tags, and deleting the data off disk. There is also an interruption point between writing a schema tag and writing the associated data. I've tried structuring the orderings so that the worst-case scenario is we have to do repeated stats collection work:

  • stats data is only ever written after an associated schema tag is written
  • we delete the tags before deleting data, and missing tags indicates invalid data

So an interrupted delete will continue to delete after the process is picked up. And an interrupted write will be missed and have to be recollected.

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5ae8af3 ok 5937457
version total_tests
5ae8af3 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5341807 ok 5937457
version total_tests
5341807 5937457
correctness_percentage
100.0

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -65,7 +65,7 @@ func CreateCommitArgParser() *argparser.ArgParser {
ap.SupportsFlag(SkipEmptyFlag, "", "Only create a commit if there are staged changes. If no changes are staged, the call to commit is a no-op. Cannot be used with --allow-empty.")
ap.SupportsString(DateParam, "", "date", "Specify the date used in the commit. If not specified the current system time is used.")
ap.SupportsFlag(ForceFlag, "f", "Ignores any foreign key warnings and proceeds with the commit.")
ap.SupportsString(AuthorParam, "", "author", "Specify an explicit author using the standard A U Thor {{.LessThan}}[email protected]{{.GreaterThan}} format.")
ap.SupportsString(AuthorParam, "", "author", "Specify an explicit author using the standard author {{.LessThan}}[email protected]{{.GreaterThan}} format.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't a typo, A U Thor is a term of art

return false, err
}

var tags []ref.TagRef
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is too dumb but you should use a new ref type for this rather than reusing tags

// ValidateSchemas returns false if any table schema in the session
// root is incompatible with the latest schema used to create a stored
// set of statistics.
ValidateSchemas(ctx *sql.Context, branch string) (bool, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should call this SchemaChange() and invert the logic

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
01a8ecb ok 5937457
version total_tests
01a8ecb 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
be0bd5c ok 5937457
version total_tests
be0bd5c 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c695fd0 ok 5937457
version total_tests
c695fd0 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b30de2a ok 5937457
version total_tests
b30de2a 5937457
correctness_percentage
100.0

@max-hoffman
Copy link
Contributor Author

@zachmu i added boilerplate for a Tuple flatbuffer type, which just holds a byte array. If the name and interface is OK I'll go ahead and ship

@max-hoffman max-hoffman merged commit 4fff283 into main Dec 13, 2024
21 checks passed
@max-hoffman max-hoffman deleted the max/stats-load-schema-purge branch December 13, 2024 21:40
nicktobey pushed a commit that referenced this pull request Dec 17, 2024
* [statspro] Save table schema hashes to detect alterations between stats loads

* fix preexisting drop datebase issue

* [ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/update.sh

* fix testing harness and in-memory restarts after alter

* tuple fb type

* add new files

* add tuple ident

* fix file ident

* idents again

---------

Co-authored-by: max-hoffman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants