Intermittent failure to connect to S3 with credentials issue #174

MarkRoss-Eviden · 2024-05-11T06:52:55Z

This issue occurs randomly and has done for a long time (i.e. not been introduced to my knowledge by recent changes to 5.1 or 5.2), and not very often, making it difficult to troubleshoot. Training works fine for hours (perhaps days), and then robomaker exits.

For example this one happened 960 episodes in, you can see it's working and then suddenly it's not. Instance uses an IAM Instance Profile with full access to S3 (if permissions were an issue it'd fail immediately) : -

Seems this is an issue not limited to DRfC, but is seen by other users doing other things: -
boto/botocore#2117
rom1504/img2dataset#137

There's a suggestion increasing var 'AWS_METADATA_SERVICE_NUM_ATTEMPTS' could work, as we might be getting throttled: -

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

larsll · 2024-05-11T08:28:36Z

So two possible workarounds:

Add normal AWS IAM credentials
Add the environment variable into docker/docker-compose-training.yml

MarkRoss-Eviden · 2024-05-11T11:14:47Z

I'll work on 'Add the environment variable into docker/docker-compose-training.yml' and see what happens, as long as it doesn't introduce new issues it should be safe to merge as adding static creds to instances isn't aws best practice and is actively discourage for security.

MarkRoss-Eviden · 2024-05-11T11:15:57Z

are there any commands in the containers that would be getting the creds specifically, or is it just background stuff the instance is doing?

MarkRoss-Eviden · 2024-05-13T06:16:56Z

fixed by #178

MarkRoss-Eviden closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent failure to connect to S3 with credentials issue #174

Intermittent failure to connect to S3 with credentials issue #174

MarkRoss-Eviden commented May 11, 2024

larsll commented May 11, 2024

MarkRoss-Eviden commented May 11, 2024

MarkRoss-Eviden commented May 11, 2024

MarkRoss-Eviden commented May 13, 2024 •

edited

Loading

Intermittent failure to connect to S3 with credentials issue #174

Intermittent failure to connect to S3 with credentials issue #174

Comments

MarkRoss-Eviden commented May 11, 2024

larsll commented May 11, 2024

MarkRoss-Eviden commented May 11, 2024

MarkRoss-Eviden commented May 11, 2024

MarkRoss-Eviden commented May 13, 2024 • edited Loading

MarkRoss-Eviden commented May 13, 2024 •

edited

Loading