Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate AWS wrong credentials with other beats #388

Closed
1 of 9 tasks
olegsu opened this issue Sep 13, 2022 · 5 comments · Fixed by #404
Closed
1 of 9 tasks

Consolidate AWS wrong credentials with other beats #388

olegsu opened this issue Sep 13, 2022 · 5 comments · Fixed by #404
Assignees
Labels
8.5 candidate Team:Cloud Security Cloud Security team related

Comments

@olegsu
Copy link
Contributor

olegsu commented Sep 13, 2022

Motivation
When Cloudbeat is running on the EKS cluster and the AWS credentials are wrong it will log a message and stay in idle mode.
The log

Could not run Beater: %!w(*fmt.wrapError=&{Could not create beater: could not retrieve user identity for ECR fetcher: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: 5b5c2c71-2d32-4a84-a573-30f5c0051d1e, api error InvalidClientTokenId: The security token included in the request is invalid. 0xc000bba9c0})

When looking at other beats we see that Cloudbeat is not behaving the same way. For example

  1. Metricbeat crashing the process
  2. Metricbeat logs a clear error
Status change to Failed: 1 error occurred: * 1 error: Error creating runner from config: 1 error: error creating aws metricset: failed DescribeRegions: operation error EC2: DescribeRegions, https response error StatusCode: 401, RequestID: 46361136-638d-4261-9646-8e9805e188d6, api error AuthFailure: AWS was not able to validate the provided access credentials

Definition of done

  • Log clear error message
  • Crash the process
  • Document what happens in the case credentials invalidation during cycle

Related tasks/epics

Checklist

Please follow the following checklist at the beginning of your work, please comment with a suggested high-level solution. It should include:

  • Comment describing high level implementation details
    • Include assumptions being taken
    • Mention relevant individuals with a reason (getting feedback, FYI etc)
  • Submit a PR for our technical index that includes breaking changes/ new features

Before closing this ticket

  • Commit the technical index PR
  • Reference to tech debts that shall be solved as we move forward
@eyalkraft
Copy link
Contributor

Alternative option - report bad health status due to the bad configuration.
See

@amirbenun
Copy link
Contributor

This task is 8.5 candidate. We should probably create another task, 8.6 candidate, about what you suggest @eyalkraft

@tinnytintin10
Copy link

I like this alignment- it will make it easy for SAs and support to begin debugging. Thanks for driving this @olegsu

Longer term, we'll align with what Eyal is pointing to. We want to propagate these various error states up to the UI in order to drive some action(s) from the user that will (hopefully) resolve these states. Here is a generic example in figma for the endpoint security integration of how this could be done. I will work with our designers on how I think this will look for us.

relevant epics worth reading through to see how teams are dealing with these errors more holistically:

8.4 epic: https://github.com/elastic/security-team/issues/3494
8.5 epic: https://github.com/elastic/security-team/issues/3918

@oren-zohar
Copy link
Collaborator

@jeniawhite raised another point to consider - what happens if the creds are not valid anymore mid-cycle

@olegsu
Copy link
Contributor Author

olegsu commented Sep 20, 2022

Update:
From what I see, the cloudbeat will eventually crash the process and the agent will restart it.
Eventually means that it might take around 2 minutes to stop the fleet management service.
We see a restart of the cloudbeat here:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.5 candidate Team:Cloud Security Cloud Security team related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants