Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar can temporarly lose config registrations #481

Closed
mpfz0r opened this issue Oct 13, 2023 · 1 comment · Fixed by #482
Closed

Sidecar can temporarly lose config registrations #481

mpfz0r opened this issue Oct 13, 2023 · 1 comment · Fixed by #482
Assignees
Labels

Comments

@mpfz0r
Copy link
Contributor

mpfz0r commented Oct 13, 2023

Problem description

With every registration request, the Sidecar performs a version check against the Graylog server.
The result then decides which kind of config assignment is expected.
Servers >= 5.0 return a type that supports multiple configs per backend. (see #441)

As a precaution, a fallback result to version 4.0.0 was added in case the version
could not be parsed from the server.
This turns out to be problematic in the following scenario.
If a Sidecar is talking to a load balancer, which sits in front of multiple
Graylog nodes, the following race condition can happen:

  • one node in the graylog cluster is being shut down.
  • the sidecar performs the version api request, but gets an error from the LB, because the request was redirected
    to the node that is being terminated.
  • The sidecar uses the fallback to version 4.0.0 and requests a config update.
  • The config update is parsed with the assumption that it is in the <= 5.0.0 format
    see https://github.com/Graylog2/collector-sidecar/blob/master/api/graylog.go#L293
  • This leads the sidecar to requesting assignments with the wrong format.
  • The wrong request reaches another Graylog node through the LB (running version >=5)
  • The server responds with an empty config
  • The sidecar stops all collectors
  • The request is not repeated because of ETag caching
  • Sidecar Version: >= 1.3
  • Graylog Version: >= 5.0
@mpfz0r mpfz0r added the bug label Oct 13, 2023
@mpfz0r mpfz0r self-assigned this Oct 13, 2023
@mpfz0r
Copy link
Contributor Author

mpfz0r commented Oct 13, 2023

[ HS #1903228006 ]

mpfz0r added a commit that referenced this issue Oct 13, 2023
If we fail to read the server version from Graylog,
stop using a fallback to 4.0.0 and retry the request
instead.

In scenarios with multiple Graylog nodes behind a load balancer,
this can lead to the sidecar temporarly stopping all collectors.

Fixes #481
mpfz0r added a commit that referenced this issue Oct 13, 2023
If we fail to read the server version from Graylog,
stop using a fallback to 4.0.0 and retry the request
instead.

In scenarios with multiple Graylog nodes behind a load balancer,
this can lead to the sidecar temporarly stopping all collectors.

Fixes #481
@thll thll closed this as completed in #482 Oct 16, 2023
thll pushed a commit that referenced this issue Oct 16, 2023
* Abort request on missing Graylog server version

If we fail to read the server version from Graylog,
stop using a fallback to 4.0.0 and retry the request
instead.

In scenarios with multiple Graylog nodes behind a load balancer,
this can lead to the sidecar temporarly stopping all collectors.

Fixes #481

* dont try to json parse arbitrary http error responses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant