-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read .xz
file by requested block
#12
Comments
Similar to Issue #13 |
The current code: https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.0.32/src/readers/blockreader.rs#L932-L943 It uses https://github.com/gendx/lzma-rs/releases/tag/v0.2.0 The problem is due to The xz format description reads
So a decent partial fix is to manually check of uncompressed size is available, that is, check not using The format has the caveat...
In this sense, the current hacky implementation is guaranteed to be correct. |
Add windows-latest in os matrix. Workaround git-for-windows/git#2803 using `git config core.protectNTFS false` Issue #12
Add windows-latest in os matrix. Workaround git-for-windows/git#2803 using `git config core.protectNTFS false` Issue #12
Add windows-latest in os matrix. Workaround git-for-windows/git#2803 using `git config core.protectNTFS false` Issue #12
Another way this issue manifests is reading too many blocks for files without syslines. File
Notice For plain log files, the BlockZero analysis would stop processing after the zeroth block (first block) did not have any apparent syslines, e.g. For very large files, this is a lot of overhead for naught, and may cause problems where computer memory is constrained.
|
|
.xz
file by requested block
Update: see Issue #283 A good solution for this Issue and Issue #13 would be having a "sequential read mode" for In "sequential read mode" mode, there is no binary search for syslines, only reading the file from start to finish. This would allow "progressive" dropping of data at different points. The This should be relatively clean to implement. There would be two paths for searching for the datetime filter A, binary and linear/sequential. ... except for this one complicating detail from my comment above:
I should just grab that raw data myself. It would simplify stuff.
Maybe just decompress the entire file once without saving it, to get the uncompressed size. Currently, the entire file is read once and saved during This proposed implementation means the entire file is read twice, at most. However, the amount of runtime memory required would be a constant of the Also, I could delete one bullet point from the
|
Cannot read the xz file in chunks/blocks. The crate Consider https://docs.rs/xz2/latest/xz2/read/struct.XzDecoder.html |
Attempt to parse more of the XZ header and block #0 header. Unfortunately, I couldn't figure get this working entirely. Leaving the code in place as it does function. The intent was to compensate for lzma-rs reading the entire file during xz_decompress. However, that's a larger problem, see gendx/lzma-rs#110 Issue #12 Issue 283
#283 refactors handling |
Problem
An
.xz
file is entirely read duringBlockReader::new
.This may cause problems for very large compressed files (the
s4
program will hold the entire uncompressed file in memory; it would use too much memory).The crate
lzma-rs
does not provide APIxz_decompress_with_options
which would allow limiting the bytes returned per call. It only providesxz_decompress
which decompresses the entire file in one call. See gendx/lzma-rs#110Solution
Read an
.xz
file per block request, as done for normal files.Update: see Issue #283
Meta-Issue #182
The text was updated successfully, but these errors were encountered: