Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Attempt to recover/invoke function when docker connectivity is poor #764

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

jfuss
Copy link
Contributor

@jfuss jfuss commented Nov 12, 2018

Issue #, if available:
#529 and #451

Description of changes:
When working completely offline, SAM CLI fails when trying to download the docker image. Instead of failing right away, we will check to see if we have an older image that we could use instead. If we do not have an image locally, we fail gracefully with a message describing we could not download the image.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jfuss jfuss requested a review from sriram-mv November 12, 2018 21:14
Copy link
Contributor

@sanathkr sanathkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions. not blocking but want to clarify

@@ -98,6 +99,8 @@ def do_cli(ctx, function_identifier, template, event, no_event, env_vars, debug_
raise UserException("Function {} not found in template".format(function_identifier))
except (InvalidSamDocumentException, OverridesNotWellDefinedError) as ex:
raise UserException(str(ex))
except DockerImagePullFailedException as ex:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about start_api and start_lambda clis? we should be capturing this exception somewhere right? If not, we should certainly catch and print pretty error message instead of a stack trace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elected for the stack trace for these. Mainly because I didn't have a good way to surface this. Take start-api for example. When calling a localhost endpoint, the response from the API will be a service error. We could catch the error and 'pretty print' something but figured the stack trace is enough for now (since this is the current behavior).

I would like to think more through this case in general (how we behave offline) but not halt that on providing some benefit now. I have no problem adding this now, just explaining my thought process of why I initially didn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. If you don't catch the exception, then it is going to crash the process and print a stack trace right?

It would be rather better to print a proper error message and terminate the CLI than a crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process doesn't crash. We rely on Flask for this. Flask will see the exception and then wrap this in a service 500 error: https://github.com/awslabs/aws-sam-cli/blob/develop/samcli/local/lambda_service/local_lambda_invoke_service.py#L114.

This is the same behavior for start-api. We only catch the errors we care about and handle them specifically, everything else gets rolled up into a 500 Error. The output will still show the stack trace of the error in the terminal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I forgot about Flask. This makes sense and is consistent with a service-like behavior.

"""
stream = stream or sys.stderr
try:
result_itr = self.docker_client.api.pull(image_name, stream=True, decode=True)
except docker.errors.ImageNotFound as ex:
raise DockerImageNotFoundException(str(ex))
except docker.errors.APIError as ex:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Looks like the default timeout on Docker Client is 60 seconds. Is it possible to add timeout to the docker_client.api.pull call as well that won't hog for 60 seconds if it can't find a network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. I don't see this as an option for in docker_client.api.pull but this is configurable in the docker_client. This will affect poor network instead of no network (I have been testing locally with wifi off without waiting for the timeout). I can update the docker_client we use but what would be a reasonable timeout for this?

This PR doesn't handle the poor network case and I didn't aim to solve this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a good practice to set timeouts anyway. In this case we can be more aggressive. If you want to fix poor network later, I am okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I want to tackle the greater poor network connectivity problem separately from this PR.

@jfuss jfuss merged commit 1fb22b1 into aws:develop Nov 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants