-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix integrity check command for minimal indexing stores #2160
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2160 +/- ##
==========================================
- Coverage 68.23% 68.06% -0.17%
==========================================
Files 134 134
Lines 16096 16146 +50
==========================================
+ Hits 10983 10990 +7
- Misses 5113 5156 +43
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
a68d70a
to
07a956d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one question and a small change needed.
@@ -22,11 +22,13 @@ module type S = sig | |||
|
|||
val integrity_check : | |||
?ppf:Format.formatter -> | |||
?heads:commit list -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is this added just to have more flexibility in what is integrity checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integrity check -- for stores created with minimal indexing strategy-- needs a reference to a commit. It then traverses that commit. After a snapshot import there is only one commit in the store indeed, but because octez does not use branches, we cannot access it. We need its hash to look it up in index. (There should be only one commit in the index and so we could read it directly from the index I guess).
For stores that do use branches, the command is indeed somewhat flexible: if no commit is specified the main branch is traversed.
And also, if used outside of the snapshot import usecase, it can be used to check the integrity of any commit available in the store.
07a956d
to
cd447cc
Compare
There is a test added with this PR with an integrity check on a corrupted store, and the corruption is catched. But I also started looking at some stats to see whether I missed something. And I don't yet completely understand these stats, so I link here my branch in case someone else wants to have a look : https://github.com/icristescu/irmin/tree/integrity_only_command_stats. These stats are generated by running
from a snapshot for block number of blobs and nodes checked by the
number of times hash is recomputed for stable and non-stable nodes:
However, the Repo.iter is probably reading several times the same node, to detect whether it should continue traversing it. So if we look at the number of different node hashes for which hash is recomputed, it is smaller: When the hash is recomputed we have access to the length of the node (https://github.com/icristescu/irmin/blob/integrity_only_command_stats/src/irmin-pack/inode.ml#L1243), so the number of nodes with lengths <= 32, between 32 and 256, and > 256 :
however if I want to print the path to the nodes that have between 32 and 256 predecessors, I don't have any (https://github.com/icristescu/irmin/blob/integrity_only_command_stats/src/irmin-pack/unix/checks.ml#L528). So I'm not sure what is going on. It could be useful to run it on a smaller store which contains inodes with length > 32 and > 256. |
@icristescu I spent some time today looking into your stats branch and playing with the output to get a better understanding of what is going on. Here is output from modified counting. It definitely appears that nodes are being visited multiple times (in Compare I was able to observe a lowered
As for looking at which steps correspond to the three different buckets (the last item that didn't make sense yet), I was able to do this (looking only at root node lengths, not predecessor lengths) and get the following (cleaned up output):
This is only for length > 32. It is also deduplicated based on step name, so the length is merely demonstrative. For <= 32, you get a lot of noise with the contracts/accounts which have 2 nodes. If it is of interest, here is a comparison of my code: icristescu/irmin@integrity_only_command_stats...metanivek:irmin:integrity_only_command_stats Overall, I think this code is good for 3.5.1, so I will move towards that next week. |
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1) CHANGES: ### Fixed - **irmin-pack** - Integrity check of a commit works on stores using the minimal indexing strategy. (mirage/irmin#2160, @icristescu)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1) CHANGES: - **irmin-pack** - Integrity check of a commit works on stores using the minimal indexing strategy. (mirage/irmin#2160, @icristescu)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1) CHANGES: ### Fixed - **irmin-pack** - Integrity check of a commit works on stores using the minimal indexing strategy. (mirage/irmin#2160, @icristescu)
extracted from #2138, to contain only the fix to the
integrity-check
command.