Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/store/nbs: Fixing GCGen to be more correct. #8612

Merged
merged 3 commits into from
Nov 28, 2024

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Nov 27, 2024

The original purpose of gc gen was two fold. The first purpose was to avoid applying the garbage collection results if the store had changed due to multi-process concurrency for any reason. The second purpose was to fast-complete a dolt gc invocation if the store had not changed at all since the last GC run.

For the first purpose, it is no longer necessary. We no longer allow multi-process access to the same NomsBlockStore.

For the second purpose, it was implemented slightly incorrectly, given the introduction of dolt gc --full. This change fixes the implementation to be more correct.

In particular, the semantics are:

  • After a dolt gc --full, an immediate invocation of dolt gc or dolt gc --full fast-completes as no collection being necessary.

  • After a dolt gc, only a dolt gc fast-completes as no collection being necessary. A dolt gc --full will run a full GC to completion.

reltuk and others added 2 commits November 27, 2024 09:29
The original purpose of gc gen was two fold. The first purpose was to avoid applying the garbage collection results if the store had changed due to multi-process concurrency for any reason. The second purpose was to fast-complete a `dolt gc` invocation if the store had not changed at all since the last GC run.

For the first purpose, it is no longer necessary. We no longer allow multi-process access to the same NomsBlockStore.

For the second purpose, it was implemented slightly incorrectly, given the introduction of `dolt gc --full`. This change fixes the implementation to be more correct.

In particular, the semantics are:

* After a `dolt gc --full`, an immediate invocation of `dolt gc` or `dolt gc --full` fast-completes as no collection being necessary.

* After a `dolt gc`, only a `dolt gc` fast-completes as no collection being necessary. A `dolt gc --full` will run a full GC to completion.
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
adfa851 ok 5937457
version total_tests
adfa851 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5778915 ok 5937457
version total_tests
5778915 5937457
correctness_percentage
100.0

Copy link
Contributor

@nicktobey nicktobey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of nits in the bats tests.

go/store/nbs/manifest.go Show resolved Hide resolved
integration-tests/bats/garbage_collection.bats Outdated Show resolved Hide resolved
integration-tests/bats/garbage_collection.bats Outdated Show resolved Hide resolved
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
60ca378 ok 5937457
version total_tests
60ca378 5937457
correctness_percentage
100.0

@reltuk reltuk merged commit f7a9ab4 into main Nov 28, 2024
21 checks passed
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.33
batching batch sql 10000 1 0.09 1.22
batching by line sql 10000 1 0.07 1.57
blob 1 blob 200000 1 0.89 3.79 3.61
blob 2 blobs 200000 1 0.87 4.31 4.49
blob no blob 200000 1 0.9 2.26 2
col type datetime 200000 1 0.82 2.89 2.78
col type varchar 200000 1 0.67 3.46 3
config width 2 cols 200000 1 0.78 2.45 2
config width 32 cols 200000 1 1.85 2.17 2.43
config width 8 cols 200000 1 0.99 2.25 1.94
pk type float 200000 1 0.9 2.14 1.72
pk type int 200000 1 0.84 2.2 1.86
pk type varchar 200000 1 1.47 1.97 1.42
row count 1.6mm 1600000 1 5.63 2.79 2.29
row count 400k 400000 1 1.43 2.69 2.18
row count 800k 800000 1 2.89 2.68 2.22
secondary index four index 200000 1 3.54 1.37 1.05
secondary index no secondary 200000 1 0.94 2.21 1.93
secondary index one index 200000 1 1.14 2.25 1.96
secondary index two index 200000 1 1.93 1.74 1.4
sorting shuffled 1mm 1000000 0 5.38 2.6 2.34
sorting sorted 1mm 1000000 1 5.36 2.64 2.2

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.12
dolt_blame_commit_filter system table 2.95
dolt_commit_ancestors_commit_filter system table 0.65
dolt_commits_commit_filter system table 1
dolt_diff_log_join_from_commit system table 2.44
dolt_diff_log_join_to_commit system table 2.37
dolt_diff_table_from_commit_filter system table 1.18
dolt_diff_table_to_commit_filter system table 1.21
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.39
dolt_log_commit_filter system table 1.05

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.71
adds_updates_deletes 60000 60000 60000 3.77
deletes_only 0 60000 0 1.9
updates_only 0 0 60000 2.41

Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 1.8
batching batch sql 10000 1 0.07 2.43
batching by line sql 10000 1 0.07 1.57
blob 1 blob 200000 1 0.87 3.77 3.59
blob 2 blobs 200000 1 0.86 4.31 4.43
blob no blob 200000 1 0.87 2.38 2.03
col type datetime 200000 1 0.78 3.01 2.94
col type varchar 200000 1 0.66 3.33 2.86
config width 2 cols 200000 1 0.78 2.62 1.95
config width 32 cols 200000 1 2 1.83 2.28
config width 8 cols 200000 1 0.97 2.27 1.92
pk type float 200000 1 0.85 2.22 1.85
pk type int 200000 1 0.82 2.27 1.85
pk type varchar 200000 1 1.47 1.65 1.41
row count 1.6mm 1600000 1 5.49 2.86 2.31
row count 400k 400000 1 1.41 2.67 2.26
row count 800k 800000 1 2.73 2.83 2.3
secondary index four index 200000 1 3.46 1.38 1.08
secondary index no secondary 200000 1 0.87 2.36 2.05
secondary index one index 200000 1 1.11 2.32 2
secondary index two index 200000 1 1.92 1.76 1.42
sorting shuffled 1mm 1000000 0 4.74 2.88 2.35
sorting sorted 1mm 1000000 1 5.58 2.4 1.97

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.11
dolt_blame_commit_filter system table 2.9
dolt_commit_ancestors_commit_filter system table 0.63
dolt_commits_commit_filter system table 1.05
dolt_diff_log_join_from_commit system table 2.45
dolt_diff_log_join_to_commit system table 2.32
dolt_diff_table_from_commit_filter system table 1.18
dolt_diff_table_to_commit_filter system table 1.21
dolt_diffs_commit_filter system table 1.03
dolt_history_commit_filter system table 1.33
dolt_log_commit_filter system table 1.05

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.73
adds_updates_deletes 60000 60000 60000 3.79
deletes_only 0 60000 0 1.86
updates_only 0 0 60000 2.4

Copy link

github-actions bot commented Dec 1, 2024

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.33
batching batch sql 10000 1 0.08 1.38
batching by line sql 10000 1 0.07 1.57
blob 1 blob 200000 1 0.88 3.81 3.66
blob 2 blobs 200000 1 0.89 4.22 4.35
blob no blob 200000 1 0.9 2.28 2.01
col type datetime 200000 1 0.78 3.09 2.96
col type varchar 200000 1 0.66 3.52 2.91
config width 2 cols 200000 1 0.8 2.34 1.95
config width 32 cols 200000 1 1.86 1.99 2.4
config width 8 cols 200000 1 0.95 2.37 2.05
pk type float 200000 1 0.91 2.05 1.77
pk type int 200000 1 0.93 2 1.68
pk type varchar 200000 1 1.5 1.71 1.4
row count 1.6mm 1600000 1 5.59 2.81 2.3
row count 400k 400000 1 1.42 2.69 2.2
row count 800k 800000 1 2.84 2.74 2.25
secondary index four index 200000 1 3.48 1.39 1.08
secondary index no secondary 200000 1 0.88 2.4 2.05
secondary index one index 200000 1 1.1 2.37 1.98
secondary index two index 200000 1 1.9 1.79 1.42
sorting shuffled 1mm 1000000 0 5.26 2.69 2.41
sorting sorted 1mm 1000000 1 5.19 2.72 2.44

Copy link

github-actions bot commented Dec 1, 2024

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.11
dolt_blame_commit_filter system table 2.91
dolt_commit_ancestors_commit_filter system table 0.65
dolt_commits_commit_filter system table 1.05
dolt_diff_log_join_from_commit system table 2.4
dolt_diff_log_join_to_commit system table 2.35
dolt_diff_table_from_commit_filter system table 1.13
dolt_diff_table_to_commit_filter system table 1.17
dolt_diffs_commit_filter system table 0.97
dolt_history_commit_filter system table 1.37
dolt_log_commit_filter system table 1.11

Copy link

github-actions bot commented Dec 1, 2024

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.75
adds_updates_deletes 60000 60000 60000 3.81
deletes_only 0 60000 0 1.88
updates_only 0 0 60000 2.43

Copy link

github-actions bot commented Dec 2, 2024

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.08 1
batching batch sql 10000 1 0.08 1.5
batching by line sql 10000 1 0.08 1.63
blob 1 blob 200000 1 0.94 3.74 3.56
blob 2 blobs 200000 1 0.96 4.06 4.15
blob no blob 200000 1 0.9 2.33 2
col type datetime 200000 1 0.8 3.03 2.89
col type varchar 200000 1 0.68 3.62 3.22
config width 2 cols 200000 1 0.82 2.33 1.9
config width 32 cols 200000 1 1.9 1.98 2.37
config width 8 cols 200000 1 0.95 2.36 2.92
pk type float 200000 1 0.9 2.28 1.8
pk type int 200000 1 0.81 2.88 2
pk type varchar 200000 1 1.62 1.51 1.35
row count 1.6mm 1600000 1 5.7 2.78 2.28
row count 400k 400000 1 1.48 2.6 2.18
row count 800k 800000 1 2.94 2.71 2.2
secondary index four index 200000 1 3.45 1.43 1.1
secondary index no secondary 200000 1 0.92 2.27 1.99
secondary index one index 200000 1 1.17 2.29 1.97
secondary index two index 200000 1 1.97 1.74 1.4
sorting shuffled 1mm 1000000 0 5.72 2.66 2.34
sorting sorted 1mm 1000000 1 5.67 2.68 2.35

Copy link

github-actions bot commented Dec 2, 2024

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.17
dolt_blame_commit_filter system table 2.97
dolt_commit_ancestors_commit_filter system table 0.63
dolt_commits_commit_filter system table 1.11
dolt_diff_log_join_from_commit system table 2.41
dolt_diff_log_join_to_commit system table 2.38
dolt_diff_table_from_commit_filter system table 1.16
dolt_diff_table_to_commit_filter system table 1.19
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.37
dolt_log_commit_filter system table 1.11

Copy link

github-actions bot commented Dec 2, 2024

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.71
adds_updates_deletes 60000 60000 60000 3.79
deletes_only 0 60000 0 1.87
updates_only 0 0 60000 2.41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants