-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-cdk-lib: Custom resource provider can't exceed 1 hour #24974
Comments
According to this blog post, I am afraid the default custom resource creation timeout is 1 hour, to extend the timeout, we will need to create a wait condition handler and allow custom resource handler callback the wait condition handler but I did find any relevant implementation in CDK. I'll discuss with the team internally. |
@pahud the totalTimeout prop ends up getting used in the WaitConditionHandler stepfunction. We use the totalTimeout to calculate the maxAttempts based on the interval count. aws-cdk/packages/aws-cdk-lib/custom-resources/lib/provider-framework/waiter-state-machine.ts Lines 52 to 62 in 43e681e
|
Thank you @peterwoodworth. Correct me if I was wrong but from what I've observed in the source code and document, the default timeout of the CfnCustomResource is 1 hour according to this document and the custom resource provider is essentially a provider of CfnCustomResource and is responsible to cfn-response to CFN within this limitation. Now, in this framework, if isCompleteHandler is provided, a waiter state machine will be created and kicked off with the onEvent handler as its entrypoint execution followed by isCompleteHandler with backoff retry invocation. This means:
Now, let's look at the totalTimeout property mentioned above. According to this, it's essentially designed for isCompleteHandler and is calculated here in calculagteRetryPolicy(). With that being said, I think I feel the CfnWaitCondition and CfnWaitConditionHandle might be required as described in this blog post and this github sample code snippet or completely auto created by the provider framework when But I could still be wrong and miss something in the source code. Let me know if I missed anything. |
I can confirm the 2 hours of export class Demotack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
const handlerCommonConfig = {
runtime: lambda.Runtime.PYTHON_3_9,
code: lambda.Code.fromAsset(path.join(__dirname, '../lambda.d')),
}
const onEventHandler = new lambda.Function(this, 'OnEvent', {
...handlerCommonConfig,
handler: 'index.on_event',
})
const isCompleteHandler = new lambda.Function(this, 'IsComplete', {
...handlerCommonConfig,
handler: 'index.is_complete',
})
const provider = new cr.Provider(this, 'Provider', {
onEventHandler,
isCompleteHandler,
queryInterval: Duration.minutes(15),
totalTimeout: Duration.hours(2),
logRetention: log.RetentionDays.ONE_DAY,
})
new CustomResource(this, 'CR', {
serviceToken: provider.serviceToken,
properties: {
startedAt: Math.floor(Date.now() / 1000),
}
})
}
} index.py import time
def on_event(event, context):
print(event)
request_type = event['RequestType']
if request_type == 'Create': return on_create(event)
if request_type == 'Update': return on_update(event)
if request_type == 'Delete': return on_delete(event)
raise Exception("Invalid request type: %s" % request_type)
def on_create(event):
props = event["ResourceProperties"]
print("create new resource with props %s" % props)
return {}
def on_update(event):
physical_id = event["PhysicalResourceId"]
props = event["ResourceProperties"]
print("update resource %s with props %s" % (physical_id, props))
# ...
def on_delete(event):
physical_id = event["PhysicalResourceId"]
print("delete resource %s" % physical_id)
# ...
def is_complete(event, context):
print(event)
resource_props = event["ResourceProperties"]
started_at = int(resource_props["startedAt"])
now_ts = int(time.time())
delta = now_ts - started_at
print(f'now: {now_ts} started: {started_at} delta: {delta}')
# 2 hours + 5min
is_ready = delta > 60*60*2+300
return { 'IsComplete': is_ready } |
@pahud If CR Provider timeout can't exceed 1 hour, is it possible to implement a workaround by deploying my own state machine to take a wait condition? For context, I am using CDK Pipelines and one of the stages takes ~6-7 hours. After it completes, I need to start the next stage of the pipeline. Based on this blog post and this article it should be possible, but would appreciate feedback. Since this is P2, not sure if when this may be implemented and if it will extend to the full 12 hour Sfn limit? Appreciate the help and guidance. |
For anyone reading this, the above is possible. It's a bit non-trivial, but I was able to implement it to bypass the 1 hour timeout. Excited for CDK to support this out of the box. |
Is there a plan for the fix? The wait condition seems reasonable for over 1 hour resources. I am currently developing a new construct depending on the framework and would like to see the fix is prioritized by team. |
Yes, that is exactly what I was thinking but unfortunately I didn't make it. I noticed this issue has been prioritized as p1 and the team will be looking at it shortly. Meanwhile, we appreciate and welcome any PR to address this issue from the community as well. I guess having a custom wait condition could be the way to go. |
Thanks @pahud. I'm not sure I have time to submit a PR, but happy to provide you with my pythonic solution for this. However, a question I do have that is somewhat related. With this workaround implemented, if I have an |
We are going to fix the documentation to align with AWS CloudFormation 1 hour timeout. For the long running deployments please see https://aws.amazon.com/blogs/devops/implementing-long-running-deployments-with-aws-cloudformation-custom-resources-using-aws-step-functions/ |
@evgenyka to be clear, is CDK not going to support this for now? |
|
Describe the bug
A custom resource provider can be configured to have a totalTimeout of up to 2 hours according to this documentation. However, in practice the custom resource times out in CloudFormation after 1 hour with this message:
Expected Behavior
I expect CloudFormation to allow up to 2 hours for the custom resource to be created.
Current Behavior
The step function which is created by the provider framework is successfully configured to run up to 2 hours, but CloudFormation fails after 1 hour. Even though CloudFormation fails to create/update the resource, the step function keeps running until the resource is successfully provisioned before the 2 hour timeout.
Reproduction Steps
I've created a custom resource provider which runs asynchronously. I've followed this documentation to set up the provider with an
isComplete
handler which polls the status of my custom resource: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.custom_resources-readme.html#asynchronous-providers-iscomplete.Possible Solution
Either update the custom resource provider to support a max
totalTimeout
of 1 hour or clarify the intention of the timeout value, if it's not meant to represent the timeout supported by CloudFormation.Additional Information/Context
No response
CDK CLI Version
2.72.0
Framework Version
No response
Node.js Version
v14.20.0
OS
macOS 12.6.1
Language
Typescript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: