Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[add_cloud_metadata] Remove logger for AWS/EC2 #36829

Merged
merged 5 commits into from
Nov 20, 2023
Merged

[add_cloud_metadata] Remove logger for AWS/EC2 #36829

merged 5 commits into from
Nov 20, 2023

Conversation

constanca-m
Copy link
Contributor

What

Remove the logger for the provider AWS/EC2 in add_cloud_metadata.

Details

add_cloud_metadata is enabled by default. When the provider AWS/EC2 is setup we see this message at all times, even when we are not using AWS/EC2:

{"log.level":"warn","@timestamp":"2023-07-06T18:24:06.085Z","message":"error fetching EC2 Identity Document: operation error ec2imds: GetInstanceIdentityDocument, canceled, context deadline exceeded.","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":91,"file.name":"add_cloud_metadata/provider_aws_ec2.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

This change was introduced by this PR: #28285.

It is happening because when creating the provider, we call this function:

fetchers, err := setupFetchers(initProviders, c)

And then we try to create each provider of the list:

fetcher, err := ff.Create(name, c)

The problem comes when this function is called. We have a logger that will start outputting warnings:

logger := logp.NewLogger("add_cloud_metadata")

If we check the other providers' files (provider_*.go), we can check that none of them are using loggers inside their Create.

The easiest solution for now is just to delete the loggers.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

The easiest way is to:

  1. Clone this branch.
  2. Build metricbeat.
  3. Start it.
  4. Check the error is still no longer there.

Related issues

Logs

It is no longer present in the logs.

@constanca-m constanca-m added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Oct 12, 2023
@constanca-m constanca-m self-assigned this Oct 12, 2023
@constanca-m constanca-m requested a review from a team as a code owner October 12, 2023 14:44
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 12, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 12, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @constanca-m? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@constanca-m constanca-m added the backport-7.17 Automated backport to the 7.17 branch with mergify label Oct 12, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 12, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-01T08:33:46.936+0000

  • Duration: 71 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 28622
Skipped 2015
Total 30637

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

if err != nil {
logger.Warnf("error fetching cluster name metadata: %s.", err)
}
clusterName, _ := fetchEC2ClusterNameTag(awsConfig, instanceIdentity.InstanceIdentityDocument.InstanceID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should not ignore the error here. When we do collect tags from an ec2 instance, this error should be surfaced. For example user might not have DescribeTags permission configured properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this warning here. Also for the original log, since we removed the logger.Warnf("error fetching EC2 Identity Document: %s.", err) line and it has a return there at line 88, this warning should not matter when EC2 is not the proper cloud metadata source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @kaiyan-sheng , I added the log message again to this one

@kaiyan-sheng
Copy link
Contributor

Hey @constanca-m thanks for working on this! I think instead of removing the logger, maybe we should reconsider the logic in add_cloud_metadata and add functions to distinguish different cloud platforms. This way if we are not running on ec2, the code you modified will not even get touched.

@constanca-m
Copy link
Contributor Author

I had a look at this again. The main difference between this provider and the others is the generic fetcher (thanks @tetianakravchenko for pointing it out).

To give context, the way add_cloud_metadata is working is quite simple: we make a request to each provider metadata URL. After some timeout, if there is no reply then the provider will stop. This function is the one to fetch metadata, and AWS uses this one: fetchRawProviderMetadata (the one with the logger).

The responses (timeouts and errors of each provider) are taken care of here:

for i := 0; i < len(p.initData.fetchers); i++ {

If we have no providers (meaning that all of them timed out with the request), we end up doing nothing. Otherwise we take care of each result. Since the logs are going to result.error, I opted for the following logic: if the error beats the timeout, then we should print, otherwise we should not. For that I add:

p.logger.Errorf("add_cloud_metadata: received error %v", result.err)

And this should solve the loggers problem.

What do you think @gizas @tetianakravchenko @kaiyan-sheng ?

@constanca-m
Copy link
Contributor Author

/test

@@ -22,13 +22,14 @@ import (
"fmt"
"net/http"

"github.com/elastic/elastic-agent-libs/logp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I dont think we need to move this import around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move it around because of the errors of one of the tests. They force it to be sorted through the tool goimport

Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write up, looks good to me (besides the lib import)!

@constanca-m constanca-m merged commit d66a000 into elastic:main Nov 20, 2023
7 checks passed
@constanca-m constanca-m deleted the remove-warning-for-ec2-aws branch November 20, 2023 10:56
mergify bot pushed a commit that referenced this pull request Nov 20, 2023
* Remove logger.

* Add error message.

* Add log message

(cherry picked from commit d66a000)

# Conflicts:
#	libbeat/processors/add_cloud_metadata/provider_aws_ec2.go
@constanca-m constanca-m restored the remove-warning-for-ec2-aws branch November 20, 2023 11:10
@constanca-m constanca-m deleted the remove-warning-for-ec2-aws branch November 20, 2023 11:11
@kaiyan-sheng kaiyan-sheng added the backport-v8.11.0 Automated backport with mergify label Nov 20, 2023
mergify bot pushed a commit that referenced this pull request Nov 20, 2023
* Remove logger.

* Add error message.

* Add log message

(cherry picked from commit d66a000)
mergify bot added a commit that referenced this pull request Nov 20, 2023
* Remove logger.

* Add error message.

* Add log message

(cherry picked from commit d66a000)

Co-authored-by: Constança Manteigas <[email protected]>
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* Remove logger.

* Add error message.

* Add log message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.11.0 Automated backport with mergify Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add_cloud_metadata produces errors running in non-cloud elastic-agent install
6 participants