Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] gcp-pubsub: Restart Pub/Sub client on errors #32712

Merged
merged 2 commits into from
Aug 17, 2022

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Aug 16, 2022

What does this PR do?

This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes #32550

Why is it important?

This improves the reliability of the pub/sub input by handling errors that are not retried by the pub/sub SDK.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Logs

{
  "log.level": "info",
  "@timestamp": "2022-08-16T12:09:40.107-0400",
  "log.logger": "gcp.pubsub",
  "log.origin": {
    "file.name": "gcppubsub/input.go",
    "file.line": 142
  },
  "message": "Pub/Sub input worker has started.",
  "service.name": "filebeat",
  "pubsub_project": "FAKE",
  "pubsub_topic": "foo",
  "pubsub_subscription": {
    "Name": "filebeat",
    "NumGoroutines": 1,
    "MaxOutstandingMessages": 1000,
    "Create": true
  },
  "ecs.version": "1.6.0"
}
{
  "log.level": "warn",
  "@timestamp": "2022-08-16T12:10:40.109-0400",
  "log.logger": "gcp.pubsub",
  "log.origin": {
    "file.name": "gcppubsub/input.go",
    "file.line": 159
  },
  "message": "Restarting failed Pub/Sub input worker.",
  "service.name": "filebeat",
  "pubsub_project": "FAKE",
  "pubsub_topic": "foo",
  "pubsub_subscription": {
    "Name": "filebeat",
    "NumGoroutines": 1,
    "MaxOutstandingMessages": 1000,
    "Create": true
  },
  "error": {
    "message": "failed to subscribe to pub/sub topic: failed to check if subscription exists: context deadline exceeded",
    "stack_trace": "\ngithub.aaakk.us.kg/elastic/beats/v7/x-pack/filebeat/input/gcppubsub.(*pubsubInput).run\n\tgithub.aaakk.us.kg/elastic/beats/v7/x-pack/filebeat/input/gcppubsub/input.go:186\ngithub.aaakk.us.kg/elastic/beats/v7/x-pack/filebeat/input/gcppubsub.(*pubsubInput).Run.func1.1\n\tgithub.aaakk.us.kg/elastic/beats/v7/x-pack/filebeat/input/gcppubsub/input.go:157\nruntime.goexit\n\truntime/asm_arm64.s:1263"
  },
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2022-08-16T12:11:20.442-0400",
  "log.logger": "gcp.pubsub",
  "log.origin": {
    "file.name": "gcppubsub/input.go",
    "file.line": 169
  },
  "message": "Pub/Sub input worker has stopped.",
  "service.name": "filebeat",
  "pubsub_project": "FAKE",
  "pubsub_topic": "foo",
  "pubsub_subscription": {
    "Name": "filebeat",
    "NumGoroutines": 1,
    "MaxOutstandingMessages": 1000,
    "Create": true
  },
  "ecs.version": "1.6.0"
}

This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes elastic#32550
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 16, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 16, 2022
@andrewkroh andrewkroh added the backport-7.17 Automated backport to the 7.17 branch with mergify label Aug 16, 2022
@elastic elastic deleted a comment from mergify bot Aug 16, 2022
@andrewkroh andrewkroh changed the title [Filebeat] gcp-pubsub - Restart Pub/Sub client on errors [Filebeat] gcp-pubsub: Restart Pub/Sub client on errors Aug 16, 2022
@andrewkroh andrewkroh marked this pull request as ready for review August 16, 2022 17:24
@andrewkroh andrewkroh requested a review from a team as a code owner August 16, 2022 17:24
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 16, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-16T23:43:39.393+0000

  • Duration: 79 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 2174
Skipped 166
Total 2340

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Dan Kortschak <[email protected]>
@andrewkroh andrewkroh merged commit 947c837 into elastic:main Aug 17, 2022
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes #32550

(cherry picked from commit 947c837)
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes #32550

(cherry picked from commit 947c837)
v1v pushed a commit to v1v/beats that referenced this pull request Aug 22, 2022
… client on errors (elastic#32716)

This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes elastic#32550

(cherry picked from commit 947c837)

Co-authored-by: Andrew Kroh <[email protected]>
cmacknz pushed a commit that referenced this pull request Aug 24, 2022
…on errors (#32717)

This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes #32550

(cherry picked from commit 947c837)

Co-authored-by: Andrew Kroh <[email protected]>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
This modifies the gcp-pubsub input in Filebeat to include its own retry loop
rather than being entirely dependent upon the pub/sub client SDK to retry requests.

The input will only exit when the shutdown is triggered by Filebeat. Any errors
generated by the pub/sub client will be logged and then the input will restart the
pub/sub client. It will throttle restarts to once per 30s.

Fixes #32550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.4.0 Automated backport with mergify bug Filebeat Filebeat
Projects
None yet
3 participants