-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check and avoid to process corrupted gzip files, close #261 #265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor nitpicks + might be worth bringing 👀 in for docs review.
94fa3b2
to
ab806e8
Compare
I would ask if @karenzone could give just an eye to the doc change part (https://github.com/logstash-plugins/logstash-input-file/pull/265/files#diff-d78ef6a21fe9540cd4fdb318c7516596) |
CHANGELOG.md
Outdated
@@ -1,6 +1,10 @@ | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature and description. I added a suggestion for handling boolean options, and offered a possible reword. Please let me know what you think.
Before start read a compressed file, checks for its validity. | ||
This request a full read of the archive, so potentially could cost time. | ||
If not specified to true, and the file is corrupted, could end in cyclic processing of the broken file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good explanation. I took the information you provided, and tried to reword it a bit. What do you think about this:
The
read
option kicks off a full read of an archive, and could potentially
waste time trying to process an invalid file.
When set totrue
, this option verifies that a compressed file is valid before
reading it.
If this option is not explicitly set to
true
, a corrupt file could cause
cyclic processing of the broken file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong, from this rewording I understand that reading an archive is costly, so before read a corrupted archive (and waste time) this option enables the verification.
In the original form I tried to describe that processing a corrupted archive led to looping on that archive and to avoid this we could enable this flag. Enabling this flag means read upfront the entire file for a verification and then read it again for processing. In this case for a not corrupted archive this could be considered a waste of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this:
When set to
true
, this setting verifies that a compressed file is valid before
processing it. There are two passes through the file--one pass to
verify that the file is valid, and another pass to process the file.
Validating a compressed file requires more processing time, but can prevent a
corrupt archive from causing looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @karenzone sounds really really better
## 4.1.16 | ||
- Added configuration setting `check_archive_validity` settings to enable gzipped files verification, | ||
issue [#261](https://github.com/logstash-plugins/logstash-input-file/issues/261) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments about the changelog:
- It looks like the changelog contains the same info for both 4.1.16 and 4.1.17.
- I had simultaneiously claimed 4.1.17 for changes in [DOC] Doc improvements #266. Let's coordinate on version bump and publishing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doubling of description was my fault on conflict resolution, now fixed. For the the rest, we could merge your `4.1.17' and this one so that we publish just one time, WDYT?
…y check on archives, close logstash-plugins#261
f02e369
to
59cf2ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Andrea Selva merged this into the following branches!
|
Fixes #261, before processing a gzip file, verify it's not corrupted else skip it.
The check consist in reading it fully