Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent NoCredentialsError('Unable to locate credentials') #2117

Closed
JoeZ99 opened this issue Aug 4, 2020 · 3 comments
Closed

Intermittent NoCredentialsError('Unable to locate credentials') #2117

JoeZ99 opened this issue Aug 4, 2020 · 3 comments
Assignees
Labels
closed-for-staleness guidance Question that needs advice or information.

Comments

@JoeZ99
Copy link

JoeZ99 commented Aug 4, 2020

Describe the bug

  • We've got a set of ec2 instances (2 to 10) that performs some tasks related to s3 buckets. More precisely: they copy files from an external location to an s3 bucket, and download from that bucket afterwards (grosso modo)
  • Those ec2 instances have an aws iam Role to provide them with access to the s3 bucket. That particular Role has the policy S3FullAcess from amazon.
  • From time to time (once or twice a day, when we do daily hundreds of thousands of those operations), the error "Unable to locate credentials" appear)
  • The ec2 instance that reports the error has done several ops of the kind flawlessly, and after the error is raised, it performs subsequent operations well.
  • The ec2 instances typically last for a few hours before being terminated and replaced.

Steps to reproduce

  • Can't reproduce it, because it happens only from time to time

Expected behavior

  • I expect all the ops regarding s3 work well, provided that the ec2 instances has the right role, and in 99.5% of the times it's like that.

Debug logs
Full stack trace by adding

import botocore.session
botocore.session.Session().set_debug_logger('')

to your code.

I'll update this issue when I got the logs

@JoeZ99 JoeZ99 added the needs-triage This issue or PR still needs to be triaged. label Aug 4, 2020
@swetashre swetashre self-assigned this Aug 6, 2020
@swetashre
Copy link
Contributor

@JoeZ99 - Thank you for your post. Along with the debug logs can you please provide your ec2 instance type ?

@swetashre swetashre added guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 12, 2020
@JoeZ99
Copy link
Author

JoeZ99 commented Aug 13, 2020

@swetashre , I'm having tremendous trouble to get the logs. Putting them in place in our system is not that easy, and once they're in place, I gotta hunt down the instance before aws kill it iself :-) That's to explain why it's taking sometime to get the logs.

the instance I can tell you know: it's t3a.small

Also I've seen this error on other aws service, which is the PutMetricData operation, on an t2.medium instance. Of course I can assure you that it happens only from time to time, and most of the time the PutMetricData op works perfectly.

This is the stacktrace of the op I just mentioned as seen by sentry:

  File "botocore/client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "botocore/client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "botocore/client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError

This is the args of the first call first call botocore/client.py (line 316, in _api_call):


args | []
-- | --
kwargs | {MetricData: [{}], Namespace: 'Celery'}
operation_name | 'PutMetricData'
py_operation_name | 'put_metric_data'
self | <botocore.client.CloudWatch object at 0x7f21f806bf28>

and this are the breadcrums as sentry sees it
2020-08-12_23-47

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. label Aug 13, 2020
@swetashre
Copy link
Contributor

When attempting to retrieve credentials on an Amazon EC2 instance that has been configured with an IAM role, Boto3 will make only one attempt to retrieve credentials from the instance metadata service before giving up. You can increase this value to make Boto3 retry multiple times before giving up. For this you can use AWS_METADATA_SERVICE_NUM_ATTEMPTS

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

For this issue i would recommend creating a ticket to AWS Support as this can be an issue with ec2 instance. So the ec2 team can also look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

2 participants