Check and avoid to process corrupted gzip files, close #261 #265

andsel · 2020-03-25T17:47:21Z

Fixes #261, before processing a gzip file, verify it's not corrupted else skip it.
The check consist in reading it fully

lib/filewatch/read_mode/handlers/read_zip_file.rb

kares

LGTM, minor nitpicks + might be worth bringing 👀 in for docs review.

lib/filewatch/read_mode/handlers/read_zip_file.rb

andsel · 2020-03-31T10:00:44Z

I would ask if @karenzone could give just an eye to the doc change part (https://github.com/logstash-plugins/logstash-input-file/pull/265/files#diff-d78ef6a21fe9540cd4fdb318c7516596)

kares · 2020-03-31T10:56:27Z

CHANGELOG.md

@@ -1,6 +1,10 @@
+<<<<<<< HEAD


karenzone

Nice feature and description. I added a suggestion for handling boolean options, and offered a possible reword. Please let me know what you think.

docs/index.asciidoc

karenzone · 2020-04-01T17:37:17Z

docs/index.asciidoc

+Before start read a compressed file, checks for its validity.
+This request a full read of the archive, so potentially could cost time.
+If not specified to true, and the file is corrupted, could end in cyclic processing of the broken file.
+


Good explanation. I took the information you provided, and tried to reword it a bit. What do you think about this:

The read option kicks off a full read of an archive, and could potentially
waste time trying to process an invalid file.
When set to true, this option verifies that a compressed file is valid before
reading it.

If this option is not explicitly set to true, a corrupt file could cause
cyclic processing of the broken file.

If I'm not wrong, from this rewording I understand that reading an archive is costly, so before read a corrupted archive (and waste time) this option enables the verification.

In the original form I tried to describe that processing a corrupted archive led to looping on that archive and to avoid this we could enable this flag. Enabling this flag means read upfront the entire file for a verification and then read it again for processing. In this case for a not corrupted archive this could be considered a waste of time.

What about this:

When set to true, this setting verifies that a compressed file is valid before
processing it. There are two passes through the file--one pass to
verify that the file is valid, and another pass to process the file.

Validating a compressed file requires more processing time, but can prevent a
corrupt archive from causing looping.

Thanks @karenzone sounds really really better

karenzone · 2020-04-01T17:48:17Z

CHANGELOG.md

+## 4.1.16
+  - Added configuration setting `check_archive_validity` settings to enable gzipped files verification, 
+  issue [#261](https://github.com/logstash-plugins/logstash-input-file/issues/261)
+


Two comments about the changelog:

It looks like the changelog contains the same info for both 4.1.16 and 4.1.17.

I had simultaneiously claimed 4.1.17 for changes in [DOC] Doc improvements #266. Let's coordinate on version bump and publishing.

The doubling of description was my fault on conflict resolution, now fixed. For the the rest, we could merge your `4.1.17' and this one so that we publish just one time, WDYT?

…y check on archives, close logstash-plugins#261

karenzone

LGTM

elasticsearch-bot · 2020-04-03T13:26:45Z

Andrea Selva merged this into the following branches!

Branch	Commits
master	`e4b5ced`

elasticsearch-bot self-assigned this Mar 25, 2020

kares reviewed Mar 25, 2020

View reviewed changes

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved

kares reviewed Mar 25, 2020

View reviewed changes

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved

andsel unassigned elasticsearch-bot Mar 26, 2020

kares approved these changes Mar 30, 2020

View reviewed changes

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved

lib/filewatch/read_mode/handlers/read_zip_file.rb Outdated Show resolved Hide resolved

andsel force-pushed the fix/skip_corrupted_gzip branch from 94fa3b2 to ab806e8 Compare March 31, 2020 09:57

kares reviewed Mar 31, 2020

View reviewed changes

CHANGELOG.md Outdated

@@ -1,6 +1,10 @@

<<<<<<< HEAD

Copy link

Contributor

kares Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

andsel reacted with heart emoji

karenzone reviewed Apr 1, 2020

View reviewed changes

Added settings 'check_archive_validity' to optionally enable integrit…

59cf2ee

…y check on archives, close logstash-plugins#261

andsel force-pushed the fix/skip_corrupted_gzip branch from f02e369 to 59cf2ee Compare April 3, 2020 07:07

karenzone approved these changes Apr 3, 2020

View reviewed changes

elasticsearch-bot closed this in e4b5ced Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check and avoid to process corrupted gzip files, close #261 #265

Check and avoid to process corrupted gzip files, close #261 #265

andsel commented Mar 25, 2020

kares left a comment

andsel commented Mar 31, 2020

kares Mar 31, 2020

karenzone left a comment

karenzone Apr 1, 2020

andsel Apr 2, 2020

karenzone Apr 2, 2020

andsel Apr 3, 2020

karenzone Apr 1, 2020

andsel Apr 2, 2020

karenzone left a comment

elasticsearch-bot commented Apr 3, 2020

Check and avoid to process corrupted gzip files, close #261 #265

Check and avoid to process corrupted gzip files, close #261 #265

Conversation

andsel commented Mar 25, 2020

kares left a comment

Choose a reason for hiding this comment

andsel commented Mar 31, 2020

kares Mar 31, 2020

Choose a reason for hiding this comment

karenzone left a comment

Choose a reason for hiding this comment

karenzone Apr 1, 2020

Choose a reason for hiding this comment

andsel Apr 2, 2020

Choose a reason for hiding this comment

karenzone Apr 2, 2020

Choose a reason for hiding this comment

andsel Apr 3, 2020

Choose a reason for hiding this comment

karenzone Apr 1, 2020

Choose a reason for hiding this comment

andsel Apr 2, 2020

Choose a reason for hiding this comment

karenzone left a comment

Choose a reason for hiding this comment

elasticsearch-bot commented Apr 3, 2020