Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check and avoid to process corrupted gzip files, close #261 #265

Closed

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Mar 25, 2020

Fixes #261, before processing a gzip file, verify it's not corrupted else skip it.
The check consist in reading it fully

@elasticsearch-bot elasticsearch-bot self-assigned this Mar 25, 2020
Copy link
Contributor

@kares kares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nitpicks + might be worth bringing 👀 in for docs review.

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved
lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved
lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved
@andsel andsel force-pushed the fix/skip_corrupted_gzip branch from 94fa3b2 to ab806e8 Compare March 31, 2020 09:57
@andsel
Copy link
Contributor Author

andsel commented Mar 31, 2020

I would ask if @karenzone could give just an eye to the doc change part (https://github.com/logstash-plugins/logstash-input-file/pull/265/files#diff-d78ef6a21fe9540cd4fdb318c7516596)

CHANGELOG.md Outdated
@@ -1,6 +1,10 @@
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature and description. I added a suggestion for handling boolean options, and offered a possible reword. Please let me know what you think.

docs/index.asciidoc Outdated Show resolved Hide resolved
Before start read a compressed file, checks for its validity.
This request a full read of the archive, so potentially could cost time.
If not specified to true, and the file is corrupted, could end in cyclic processing of the broken file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation. I took the information you provided, and tried to reword it a bit. What do you think about this:

The read option kicks off a full read of an archive, and could potentially
waste time trying to process an invalid file.
When set to true, this option verifies that a compressed file is valid before
reading it.

If this option is not explicitly set to true, a corrupt file could cause
cyclic processing of the broken file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, from this rewording I understand that reading an archive is costly, so before read a corrupted archive (and waste time) this option enables the verification.

In the original form I tried to describe that processing a corrupted archive led to looping on that archive and to avoid this we could enable this flag. Enabling this flag means read upfront the entire file for a verification and then read it again for processing. In this case for a not corrupted archive this could be considered a waste of time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this:

When set to true, this setting verifies that a compressed file is valid before
processing it. There are two passes through the file--one pass to
verify that the file is valid, and another pass to process the file.

Validating a compressed file requires more processing time, but can prevent a
corrupt archive from causing looping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @karenzone sounds really really better

## 4.1.16
- Added configuration setting `check_archive_validity` settings to enable gzipped files verification,
issue [#261](https://github.com/logstash-plugins/logstash-input-file/issues/261)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments about the changelog:

  • It looks like the changelog contains the same info for both 4.1.16 and 4.1.17.
  • I had simultaneiously claimed 4.1.17 for changes in [DOC] Doc improvements #266. Let's coordinate on version bump and publishing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doubling of description was my fault on conflict resolution, now fixed. For the the rest, we could merge your `4.1.17' and this one so that we publish just one time, WDYT?

@andsel andsel force-pushed the fix/skip_corrupted_gzip branch from f02e369 to 59cf2ee Compare April 3, 2020 07:07
Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearch-bot
Copy link

Andrea Selva merged this into the following branches!

Branch Commits
master e4b5ced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

logstash repeatedly processes a corrupted compressed file in and endless loop
4 participants