-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to renew STS token with "Credential expiration ... is less than10 minutes in the future. Disabling auto-refresh" #138
Comments
After this error, does it retry and succeed? The code is supposed to "auto-refresh" when it knows the credentials will expire in a few minutes. Unless the expiration can not be parsed or is only a few minutes in the future, in which case it just waits till the creds have actually expired. But once a real auth error occurs, it should refresh the creds. |
No, it continues outputting errors to say it failed to send events. Here's the latest output from one of my fluentbit pods, which has been in the state for a couple of hours:
|
I see. Okay, I'll put up a pull request to fix the refresh behavior. I remember when I wrote this code I was a little uncertain of the logic. I guess there's no real reason why it needs to "disable auto-refresh". I'll check the logic for refreshing after errors too- that seems like a bug. Once it hits an auth error it should try refresh credentials. |
@alexmbird while you wait for my fix, you can switch from the |
Many thanks for responding so promptly - it's appreciated. As a stopgap measure I've followed your suggestion of switching from |
@alexmbird I put up a PR to the upstream repo to fix this (linked above). I also built a container image with the code: That image can be pulled from any aws account (its repo policy trusts The code built in that image is from the latest master commit, which is unreleased as of now. It works as far as I can tell. You might see some warning messages in the fluent bit logs about socket status, ignore those. |
@PettitWesley thanks for this. I've updated my development cluster to run that image. I'm afraid all it does at present is go into an infinite loop printing:
|
@alexmbird I shouldn't have tried to build off of master, it's too unstable right now. Please re-pull that image; I just updated it with a build based on 1.6 which should be stable. |
No worries. I've deployed the new image - it starts correctly and is transmitting events to cloudwatch. It's the end of the working day here so I'll leave it overnight to see if it correctly renews the token. |
Good morning! The new build has been running happily all night on my dev cluster. I don't see any messages in the (info-level) logs about renewing the tokens but perhaps that's expected. One tiny (& somewhat offtopic) request: could the regular "Sent 8 events to CloudWatch" log messages be debug rather than info level? I ask because with five separate outputs they get spammy and I think that's the behaviour the old |
Hey there. The special image you baked is still running well on my test cluster. Do you happen to have an ETA before the fix will hit an official release? |
@alexmbird If you're comfortable using the upstream image distro, then We'll do a release of AWS for Fluent Bit probably next week once 1.7.2 comes out. |
I've seen the AWS for Fluent Bit 1.7.2 release and updated our clusters to that. I can confirm that the STS renewal bug is now fixed. We have discovered another problem, but it probably isn't related so I'll open a new ticket. |
I'm trying to get FluentBit up and running for an EKS cluster, with the intention of replacing a creaking Fluentd setup. Presently I'm running
public.ecr.aws/aws-observability/aws-for-fluent-bit:2.10.1
, configured with[OUTPUT]
sections for five CloudWatch log groups. Each looks like this:The rest of the config is based on Amazon's sample and the application itself is installed with the
fluent-bit
chart from https://fluent.github.io/helm-charts. For access to CloudWatch I'm using Kiam, which has been running well with other applications on the cluster (including fluentd, writing to the exact same log groups) for a year.FluentBit is struggling to use the STS token correctly, however. It starts happily enough but after running for a few minutes it emits this error:
Another few minutes pass, then the credential expires and the output is a stream of errors like:
With an expiring credential, disabling auto-refresh is the absolute last thing I want it to do :)
I've tried modifying my Kiam config so the
session-duration
is30m
rather than the default15m
, but all that happens is that FluentBit takes a while longer before emitting the error.Hence a question - do I have it misconfigured, or have I run into a bug with FluentBit's handling of session token renewal?
The text was updated successfully, but these errors were encountered: