-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails processing jsonl+gzip when using S3 Input plugin #18696
Comments
I discovered that Filebeat 7.5.2 has no trouble reading and shipping these objects from S3, so it seems like this is a regression introduced between Filebeat 7.6.0 and 7.6.2. |
Pinging @elastic/integrations-platforms (Team:Platforms) |
I suggest checking the first two bytes for the GZIP magic number Or alternately, assume the GZIP library already checks for the magic number and if GZIP decompression fails, then just treat the stream as if it has already been decompressed. |
When using Filebeat with the S3 input plugin (beta), Filebeat will fail processing files that contain JSON lines and are GZIP-compressed. This is the output format of AWS GuardDuty with S3 export enabled, which means that Filebeat is unable to process logs as written by AWS GuardDuty. The issue occurs specifically when the object contains JSON lines, is GZIP-compressed, and has the following metadata on the S3 object:
Note: I originally posted this as a thread on the discussion forum, but now I am confident it is a bug in Filebeat so I'm creating an issue here.
Actual Results
When using Filebeat 7.6.2 I get these error messages. In this case it seems that Filebeat is attempting to decompress the GZIP stream, but the stream has already been automatically decompressed by the transport in aws-sdk-go based on the object Metadata.
And when using Filebeat 7.7.0 I get slightly different error messages. This seems to stem from the fact that GuardDuty is incorrectly assigning the
application/json
MIME type to files that actually contain JSON lines/newline-delimited JSON.Expected Results
Presumably GuardDuty should be using
application/json-seq
orapplication/jsonstream
orapplication/x-json-stream
orapplication/x-ndjson
orapplication/x-jsonlines
but there doesn't seem to be a standard there. So it seems like Filebeat should be able to handle cases where JSON-lines/NDJson files are saved with theapplication/json
MIME type, perhaps with automatic detection or a configuration override.The S3 input plugin should also be careful to not attempt GZIP decompression twice (once automatic in the transport layer, and once explicitly in the s3 input code).
Additional information
My Filebeat Configuration
The text was updated successfully, but these errors were encountered: