Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Improve aws-s3 gzip file detection to avoid false negatives #29969

Merged

Conversation

andrewkroh
Copy link
Member

What does this PR do?

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

Why is it important?

Incorrect content type detection can result in garbage data being ingested.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes elastic#29968
@andrewkroh andrewkroh added bug Filebeat Filebeat Team:Integrations Label for the Integrations team Team:Security-External Integrations backport-v8.0.0 Automated backport with mergify backport-7.17 Automated backport to the 7.17 branch with mergify labels Jan 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 24, 2022
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-24T16:48:13.073+0000

  • Duration: 121 min 36 sec

  • Commit: bbb1394

Test stats 🧪

Test Results
Failed 0
Passed 4480
Skipped 311
Total 4791

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

return false, nil
}
// gzip magic number (1f 8b) and the compression method (08 for DEFLATE).
return bytes.HasPrefix(buf, []byte{0x1F, 0x8B, 0x08}), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this throws false positives, an approach that can be used is to peek enough of the stream to allow the gzip header to be read, an error indicates definitively that the stream is not a gzip. Noting here for future consideration only.

@andrewkroh andrewkroh merged commit 61a7d36 into elastic:master Jan 24, 2022
mergify bot pushed a commit that referenced this pull request Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)
mergify bot pushed a commit that referenced this pull request Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)
andrewkroh added a commit that referenced this pull request Jan 24, 2022
…29975)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)

Co-authored-by: Andrew Kroh <[email protected]>
andrewkroh added a commit that referenced this pull request Jan 24, 2022
…29974)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)

Co-authored-by: Andrew Kroh <[email protected]>
v1v added a commit that referenced this pull request Jan 28, 2022
* upstream/7.17: (30 commits)
  [7.17](backport #29966) Add the Elastic product origin header when talking to Elasticsearch or Kibana. (#30000)
  [Heartbeat] Change size of data on ICMP packet (#29948) (#29978)
  Add clarification about enableing dashboard loading (#29985) (#29989)
  Improve aws-s3 gzip file detection to avoid false negatives (#29969) (#29974)
  ci: docker login step for pulling then pushing (#29960) (#29963)
  x-pack/auditbeat/module/system/socket: get full length path and arg from /proc when not available from kprobe (#29410) (#29958)
  [Automation] Update elastic stack version to 7.17.0-ab4975a2 for testing (#29956)
  [Automation] Update elastic stack version to 7.17.0-1bd58b32 for testing (#29938)
  [7.17](backport #29913) [Metricbeat] gcp.gke: fix overview dashboard (#29914)
  [7.17](backport #29605) Fix annotation enrichment (#29834)
  [Automation] Update elastic stack version to 7.17.0-e1efbe3a for testing (#29922)
  [Automation] Update elastic stack version to 7.17.0-68da5d12 for testing (#29904)
  [7.17][Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES (backport #29413) (#29896)
  Merge pull request from GHSA-rj4h-hqvq-cc6q
  [7.17](backport #29681) Change docker image from CentOS 7 to Ubuntu 20.04 (#29817)
  Fix YAML indentation in `parsers` examples (#29663) (#29894)
  [Automation] Update elastic stack version to 7.17.0-079761a0 for testing (#29864)
  Fix Filebeat dissect processor field tokenization in documentation (#29680) (#29883)
  Enable require_alias for Bulk requests for all actions when target is a write alias (#29879)
  Update Index template loading guide to use the correct endpoint (#29869) (#29877)
  ...
yashtewari pushed a commit to build-security/beats that referenced this pull request Jan 30, 2022
…29969)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes elastic#29968
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* upstream/7.17: (30 commits)
  [7.17](backport elastic#29966) Add the Elastic product origin header when talking to Elasticsearch or Kibana. (elastic#30000)
  [Heartbeat] Change size of data on ICMP packet (elastic#29948) (elastic#29978)
  Add clarification about enableing dashboard loading (elastic#29985) (elastic#29989)
  Improve aws-s3 gzip file detection to avoid false negatives (elastic#29969) (elastic#29974)
  ci: docker login step for pulling then pushing (elastic#29960) (elastic#29963)
  x-pack/auditbeat/module/system/socket: get full length path and arg from /proc when not available from kprobe (elastic#29410) (elastic#29958)
  [Automation] Update elastic stack version to 7.17.0-ab4975a2 for testing (elastic#29956)
  [Automation] Update elastic stack version to 7.17.0-1bd58b32 for testing (elastic#29938)
  [7.17](backport elastic#29913) [Metricbeat] gcp.gke: fix overview dashboard (elastic#29914)
  [7.17](backport elastic#29605) Fix annotation enrichment (elastic#29834)
  [Automation] Update elastic stack version to 7.17.0-e1efbe3a for testing (elastic#29922)
  [Automation] Update elastic stack version to 7.17.0-68da5d12 for testing (elastic#29904)
  [7.17][Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES (backport elastic#29413) (elastic#29896)
  Merge pull request from GHSA-rj4h-hqvq-cc6q
  [7.17](backport elastic#29681) Change docker image from CentOS 7 to Ubuntu 20.04 (elastic#29817)
  Fix YAML indentation in `parsers` examples (elastic#29663) (elastic#29894)
  [Automation] Update elastic stack version to 7.17.0-079761a0 for testing (elastic#29864)
  Fix Filebeat dissect processor field tokenization in documentation (elastic#29680) (elastic#29883)
  Enable require_alias for Bulk requests for all actions when target is a write alias (elastic#29879)
  Update Index template loading guide to use the correct endpoint (elastic#29869) (elastic#29877)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.0.0 Automated backport with mergify bug Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[filebeat] aws-s3 input falsely detects gzip file as a font
3 participants