-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory corruption in hardlink
tool
#3330
Comments
It seems you're right. The calculation should be more robust. I'll make sure to use your suggestion.
Each node in the tree contains files of the same size, and ul_fileeq_set_size() is called when we switch from one node (size) to another. All settings should be reset by ul_fileeq_data_deinit(). It means the ul_fileeq_set_size() setting should be valid for all files within the next
I have no time to try it, but I will do it later. Thanks for all the reproduces!
There should be an ul_fileeq_data_deinit() call between the |
The current code rounds down the values for readsiz and blocksmax, which is incorrect. The sizes must be large enough to match the files. Addresses: util-linux#3330 Signed-off-by: Karel Zak <[email protected]>
I think |
The current code rounds down the values for readsiz and blocksmax, which is incorrect. The sizes must be large enough to match the files. Addresses: #3330 Signed-off-by: Karel Zak <[email protected]> (cherry picked from commit 70c1ffb)
hardlink
fails with various memory corruption errors (e.g.double free or corruption (!prev)
,munmap_chunk(): invalid pointer
,malloc(): invalid size (unsorted)
) on large datasets.The output of
valgrind hardlink ...
shows the following trace:(the source code @ 25e62e5 was compiled with
CFLAGS=-g
)During my own investigation I found at least two problematic places in
lib/fileeq.c
andmisc-utils/hardlink.c
.ul_fileeq_set_size
inlib/fileeq.c
rounds down bothreadsiz
andblocksmax
variables when larger readsize is usedutil-linux/lib/fileeq.c
Lines 273 to 280 in 25e62e5
In the unfortunate case
get_digest
may request one extra block when reading the tail of the file.The following commands reproduce this error:
We create 256 identical files with size 43007 bytes. With reduced io-size and cache-size we get the following debug output with
ULFILEEQ_DEBUG=16
:Note, that 43007 > 32 + 41*1048 = 43000
This problem can be fixed with the following patch:
visitor
leading to small values ofblocksmax
during memory allocation fordata->blocks
buffers at early iterations and large values ofblocksmax
duringget_digest
calls at later iterationsutil-linux/misc-utils/hardlink.c
Lines 1093 to 1101 in 25e62e5
The overall idea of changing the
readsiz
value during iterations insidevisitor
looks very bad to me. If we callul_fileeq_set_size
and change thereadsiz
value, then we should also invalidate and recalculate all cached digests. I thinkul_fileeq_set_size
should be called only once during iterations insidevisitor
for consistency.The following commands reproduce this error:
We create two sets of 256 identical files with the same 32 byte header and different first block digests. If we reduce io-size and cache-size and organize iterations with file modification time ordering, then we get the following debug output with
ULFILEEQ_DEBUG=16
:Note, that
blocksmax=2
was used for memory allocation for all file candidates, whileblocksmax=3
was used during digests calculation for files in the second batch of identical files.The text was updated successfully, but these errors were encountered: