-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read: check that sizes are smaller than the file when reading #630
Conversation
ee5318d
to
6ef1345
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this is a good fix.
What's the motivation for the second test? It's fine to have it, but it appears to work prior to this fix.
Can you fix the CI failures and update the submodule to the merged commit.
Note that |
6ef1345
to
04dde3d
Compare
done
my initial implementation had an off by one error and get_buildid would always return Err() even on more legitimate binaries, the test is to ensure ReadCache can succeed. |
If that's the case yes please document it, and document preferred alternatives. My use case is to find the buildid of thousands of elf files in parallel. When I consulted the documentation of object, it seemed I needed to call File::parse so needed an implementation of ReadRef: This leaves me with three options:
You can imagine that the third option seemed better at the time. After using ReadCache in my code, I noticed memory usage was very spikey because of read cache so actually gave a try to a mmap implementation. I quickly abandoned the experiment, and if my memory serves me right here are the reasons:
I would be happy to discuss potential solutions or compare things more seriously and report the result if you are interested. (But maybe not on this PR) |
I default to using mmap and have never had problems with it. I sometimes use
I guess that might be true if you are only using this to read the build ID, where the total amount that you need to read is small.
I haven't encountered that, and I don't know what could cause it. I'm interested in learning how to reproduce this.
Yes there's technically ways that this can be unsound, but mmap has a long history of being a useful tool. In isolation you'll never have a problem. The problem occurs when other code or other processes modify the file at the same time. I don't think that's a reason not to use it though, as long as you are aware of this limitation and evaluate how likely it is to occur for your use case. If you do know that the file will be modified, you'll have to take steps to prevent it happening at the same time you read, such as using locks. |
I tried again and ReadCache is 47% faster for my use case. I found the source of the EINVAL: among all the files there were a quite large proportion of empty files, and mmaping empty files results in EINVAL. |
Also about getting sigbus in case of concurrent modification of the file: I suppose that if you reimplement a cli tool the difference between sigbus and getting a real error is just the quality of the error message, but I'm implementing a server (debuginfod) and crashing the whole long lived process because of a single "wrong" file is quite annoying in this case. I suppose I could avoid that by forking and using the child process as a sacrificial victim just in case, but the increase in complexity in off-putting. |
would it be possible to get a new release with this change? |
Published 0.33.0 |
…imli-rs#630) This avoids large allocations due to invalid sizes in corrupted files.
related to #204
added a regression test case which used to attempt to allocate 5 exabytes of memory.
test files added in gimli-rs/object-testfiles#12