Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing timezone or engineVersion on a DatabaseInstance results in an Internal Failure #6439

Closed
jls-tschanzc opened this issue Feb 25, 2020 · 12 comments · Fixed by #6534
Closed
Assignees
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed.

Comments

@jls-tschanzc
Copy link

jls-tschanzc commented Feb 25, 2020

When creating a Postgres RDS database as follows:

    const appDB = new DatabaseInstance(this, 'TestPostgresInstance', {
      engine: DatabaseInstanceEngine.POSTGRES,
      instanceClass: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MICRO),
      masterUsername: dbUserName,
      databaseName: dbName,
      vpc,
      allocatedStorage: 10,
      backupRetention: Duration.days(3),
    });

and then would like to add the timezone and engineVersion at a later time like so:

    const appDB = new DatabaseInstance(this, 'TestlPostgresInstance', {
      engine: DatabaseInstanceEngine.POSTGRES,
      instanceClass: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MICRO),
      masterUsername: dbUserName,
      databaseName: dbName,
      vpc,
      allocatedStorage: 10,
      backupRetention: Duration.days(3),
      timezone: 'Europe/Zurich',
      engineVersion: '11.5',
    });

It will always result in an Internal Failure and rollback the stack change.

I tried to determine the possible values for the engineVersion using:

aws rds describe-db-engine-versions --engine postgres --query "DBEngineVersions[].EngineVersion"

The chosen engineVersion mirrors the currently deployed one, the TZ is different from what I could gather.

Reproduction Steps

Create a Database Instance as follows:

    const vpc = new Vpc(this, 'TestVPC', {
      natGateways: 1,
      maxAzs: 2,
    });

    const appDB = new DatabaseInstance(this, 'TestPostgresInstance', {
      engine: DatabaseInstanceEngine.POSTGRES,
      instanceClass: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MICRO),
      masterUsername: 'test',
      databaseName: 'test,
      vpc,
      allocatedStorage: 10,
      backupRetention: Duration.days(3),
    });

Deploy the stack; Afterwards change the code to the following:

    const appDB = new DatabaseInstance(this, 'TestPostgresInstance', {
      engine: DatabaseInstanceEngine.POSTGRES,
      instanceClass: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MICRO),
      masterUsername: 'test',
      databaseName: 'test',
      vpc,
      allocatedStorage: 10,
      backupRetention: Duration.days(3),
      timezone: 'Europe/Zurich',
      engineVersion: '11.5',
    });

Try to deploy the change.

Error Log

Stack Event Log:

  1. StackName UPDATE_IN_PROGRESS: User Initiated
  2. DatabaseInstance UPDATE_FAILED: Internal Failure
  3. StackName UPDATE_ROLLBACK_IN_PROGRESS: The following resource(s) failed to update: [DatabaseInstance].

Environment

  • CLI Version: 1.25.0
  • Framework Version: 1.25.0
  • OS: MacOS
  • Language: English

Other

The documentation regarding the possible values for engineVersion could be extended to at least document the necessary commands to determine allowed values. The same goes for the timezone documentation.


This is 🐛 Bug Report

@jls-tschanzc jls-tschanzc added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 25, 2020
@SomayaB SomayaB added the @aws-cdk/aws-rds Related to Amazon Relational Database label Feb 27, 2020
@robertd
Copy link
Contributor

robertd commented Mar 1, 2020

@jls-tschanzc Same thing happens if you try to update MultiAZ property or change/update instance class or size. Stack update fails and you're greeted with Internal Failure message.
I've opened a support ticket with AWS and I'll keep you updated of their findings.

MultiAZ example:

Stack VectorRdsStack
 Resources
 [~] AWS::RDS::DBInstance VectorRds VectorRds92A77672 
  └─ [+] MultiAZ
      └─ true
 $ cdk deploy '*' --require-approval 'never'
 VectorRdsStack: deploying...
 VectorRdsStack: creating CloudFormation changeset...
  0/4 | 5:06:28 AM | UPDATE_IN_PROGRESS   | AWS::RDS::DBInstance                        | VectorRds (VectorRds92A77672) 
  0/4 | 5:06:28 AM | UPDATE_IN_PROGRESS   | AWS::CDK::Metadata                          | CDKMetadata 
  1/4 | 5:06:29 AM | UPDATE_FAILED        | AWS::RDS::DBInstance                        | VectorRds (VectorRds92A77672) Internal Failure
 	new DatabaseInstance (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/@aws-cdk/aws-rds/lib/instance.ts:795:22)
 	\_ new VectorRdsStack (/builds/ngtoc-devops/rds/dm-vector-rds/lib/vector-rds-stack.ts:37:24)
 	\_ Object.<anonymous> (/builds/ngtoc-devops/rds/dm-vector-rds/bin/vector-rds.ts:11:21)
 	\_ Module._compile (internal/modules/cjs/loader.js:1151:30)
 	\_ Module.m._compile (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/ts-node/src/index.ts:814:23)
 	\_ Module._extensions..js (internal/modules/cjs/loader.js:1171:10)
 	\_ Object.require.extensions.<computed> [as .ts] (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/ts-node/src/index.ts:817:12)

Instance size change:

 Stack DMVectorRdsStack
 Resources
 [~] AWS::RDS::DBInstance VectorRds VectorRds92A77672 
  └─ [~] DBInstanceClass
      ├─ [-] db.t3.small
      └─ [+] db.t3.medium
 $ cdk deploy '*' --require-approval 'never'
 DMVectorRdsSgStack
 DMVectorRdsSgStack: deploying...
  ✅  DMVectorRdsSgStack (no changes)
 Outputs:
 DMVectorRdsSgStack.dmvectorrdssg = sg-048238afe160cd7a7
 Stack ARN:
 arn:aws:cloudformation:us-west-2:746822052750:stack/DMVectorRdsSgStack/9dcf41b0-5b6d-11ea-b5ac-022a2754311a
 DMVectorRdsStack
 DMVectorRdsStack: deploying...
 DMVectorRdsStack: creating CloudFormation changeset...
  0/2 | 4:42:26 AM | UPDATE_IN_PROGRESS   | AWS::RDS::DBInstance                        | VectorRds (VectorRds92A77672) 
  1/2 | 4:42:26 AM | UPDATE_FAILED        | AWS::RDS::DBInstance                        | VectorRds (VectorRds92A77672) Internal Failure
 	new DatabaseInstance (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/@aws-cdk/aws-rds/lib/instance.ts:795:22)
 	\_ new VectorRdsStack (/builds/ngtoc-devops/rds/dm-vector-rds/lib/vector-rds-stack.ts:25:24)
 	\_ Object.<anonymous> (/builds/ngtoc-devops/rds/dm-vector-rds/bin/vector-rds.ts:13:24)
 	\_ Module._compile (internal/modules/cjs/loader.js:1151:30)
 	\_ Module.m._compile (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/ts-node/src/index.ts:814:23)
 	\_ Module._extensions..js (internal/modules/cjs/loader.js:1171:10)
 	\_ Object.require.extensions.<computed> [as .ts] (/builds/ngtoc-devops/rds/dm-vector-rds/node_modules/ts-node/src/index.ts:817:12)

@robertd
Copy link
Contributor

robertd commented Mar 2, 2020

Case ID 6844571841 with AWS support.... still waiting for response.

edit: Got response and they're investigating on their end.

@nija-at
Copy link
Contributor

nija-at commented Mar 2, 2020

As far as I can tell, the CloudFormation template generated by the CDK is consistent with what CloudFormation and RDS expect it to be. The problem is coming from the underlying call that CloudFormation makes to RDS.

On investigating the timezone error, per the documentation, the timezone property is supported only by the Microsoft Sql Server. I believe this is what's causing the error there. The error shows correctly when deploying a new database instance but an Internal Failure is generated when modifying an existing database instance.

The multiAZ might be an artifact of something similar, however, this I've not investigated.

@nija-at
Copy link
Contributor

nija-at commented Mar 2, 2020

Looked into multiAZ - I was not able to replicate this error.

I was able to both, deploy a new database instance with multiAZ set to true, as well as, start off with one with this property undefined and update the stack after setting the multiAZ property.

My CDK app -

#!/usr/bin/env node
import { App, Duration, Stack, CfnOutput } from '@aws-cdk/core';
import { DatabaseInstance, DatabaseInstanceEngine } from '@aws-cdk/aws-rds';
import { InstanceClass, InstanceSize, InstanceType, Vpc } from '@aws-cdk/aws-ec2';

const app = new App();
const stack = new Stack(app, 'mystack-2', {
  env: { account: '664773442901' }
});

const vpc = new Vpc(stack, 'db-vpc');

const appDB = new DatabaseInstance(stack, 'TestPostgresInstance', {
  engine: DatabaseInstanceEngine.POSTGRES,
  instanceClass: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MICRO),
  masterUsername: 'master',
  databaseName: 'dbname',
  vpc,
  allocatedStorage: 10,
  backupRetention: Duration.days(3),
  engineVersion: '11.5',
  multiAz: true
});

nija-at pushed a commit that referenced this issue Mar 2, 2020
Per documentation[1], 'Timezone' property is only supported on Microsoft
SQL Server. Setting this property on a DatabaseInstance with a different
database engine causes deployment to fail (1) with a validation error
for a new instance of `AWS::RDS::DBInstance` and (2) internal failure
when modifying an existing instance of `AWS::RDS::DBInstance`.

[1]:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-database-instance.html#cfn-rds-dbinstance-timezone

fixes #6439
nija-at pushed a commit that referenced this issue Mar 2, 2020
Per documentation[1], 'Timezone' property is only supported on Microsoft
SQL Server. Setting this property on a DatabaseInstance with a different
database engine causes deployment to fail (1) with a validation error
for a new instance of `AWS::RDS::DBInstance` and (2) internal failure
when modifying an existing instance of `AWS::RDS::DBInstance`.

[1]:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-database-instance.html#cfn-rds-dbinstance-timezone

fixes #6439
@nija-at nija-at removed the needs-triage This issue or PR still needs to be triaged. label Mar 2, 2020
@robertd
Copy link
Contributor

robertd commented Mar 2, 2020 via email

@nija-at
Copy link
Contributor

nija-at commented Mar 2, 2020

Initial deployment is fine, but introducing changes to the existing stack like instance size, type, multiAZ, etc is where we are seeing Internal Failure message.

This further confirms that this is not an issue with the CDK. As far as the CDK is concerned, we generate the CF template based on the app configured. It is CloudFormation that then compares the currently deployed stack with the new template being applied, and follows up by performing create, delete or modify operations to match the stack with the template being deployed.
My suspicion is that one of these APIs is the one throwing the 'Internal Failure' that CF then exposes.

In the case of timezone, it should have been an error message that the timezone property is not supported on the Postgres database engine, but instead it was an Internal Failure.
The best the CDK can do is identify such situations and provide guardrails so that the user does not have to CF to find this error, like this - #6534.

@robertd
Copy link
Contributor

robertd commented Mar 2, 2020

@nija-at That is correct. I don't know if you can see the correspondence between me and AWS support team, but they suggested that there is no ModifyDBInstance API call against RDS coming from the Cloudformation when they did a deep dive on my existing stack. This is definitely something on CF side as I've confirmed that with @MrArnoldPalmer on Gitter over the weekend. Also, AWS support tested this using v1.18.0. Not sure if there has been regression in the meantime.

I'm piggy-backing on this issue, but I should probably create a new one since #6534 is targeted for Microsoft SQL Server.

Response from AWS Support:

Hi Robert,

  Thanks for contacting AWS Support.
  This is Maicon and I'll assist you in this case.


  I understand you are facing 'internal failure' when updating MultiAZ property or change/update instance class or size of an RDS Instance using CDK.

  I saw your comment on github(https://github.com/aws/aws-cdk/issues/6439 ).

  I have tried reproduce the issue on my environment, but for me is working fine.

  I'm using the python CDK and RDS Sample from the link https://github.com/aws-samples/aws-cdk-examples/tree/master/python/rds 

  I deployed the stack successfully, and then update the instance to MultiAZ without any issues:


cdk deploy
RDSStack: deploying...
RDSStack: creating CloudFormation changeset...
 0/2 | 12:50:33 PM | UPDATE_IN_PROGRESS   | AWS::RDS::DBInstance                  | RDS (RDSE0E96D00)
0/2 Currently in progress: RDSE0E96D00

 ✅  RDSStack

Stack ARN:
arn:aws:cloudformation:ap-southeast-2:XXXXXXXXXXXXXX:stack/RDSStack/b108ad70-5c26-11ea-9603-0a97b58f1090


Also, I scaled up the instance to r4.xlarge without any issues:


cdk deploy
RDSStack: deploying...
RDSStack: creating CloudFormation changeset...
 0/2 | 1:10:09 PM | UPDATE_IN_PROGRESS   | AWS::RDS::DBInstance                  | RDS (RDSE0E96D00)
0/2 Currently in progress: RDSE0E96D00
 1/2 | 1:25:52 PM | UPDATE_COMPLETE      | AWS::RDS::DBInstance                  | RDS (RDSE0E96D00)
 1/2 | 1:25:54 PM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack            | RDSStack
 2/2 | 1:25:55 PM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack            | RDSStack

 ✅  RDSStack

Stack ARN:
arn:aws:cloudformation:ap-southeast-2:XXXXXXXXXXXXXX:stack:stack/RDSStack/b108ad70-5c26-11ea-9603-0a97b58f1090


My CDK Version is 1.18.0 (build bc924bc)

From the examples you provided, I investigated and there is no ModifyDBInstance API call against RDS coming from the Cloudformation.
For example for stack: arn:aws:cloudformation:us-west-2:xxxxxxxxxxxxxxx:stack/VectorRdsStack/adc86b20-5aa8-11ea-95e9-0614748bfb9c

2020-03-01T07:53:56.308Z 	AWS::RDS::DBInstance	VectorRds92A77672	 UPDATE_FAILED	Internal Failure

Also, looking on stack arn:aws:cloudformation:us-west-2:xxxxxxxxxxxxxxx:stack/DMVectorRdsSgStack/9dcf41b0-5b6d-11ea-b5ac-022a2754311a
the only resources created were AWS::CDK::Metadata and AWS::EC2::SecurityGroup, there is no RDS Instance created as part of this template.

I'm wondering if you can provide the steps to reproduce the issue, including:

1) CDK Version
2) Typescrypt Version
3) CDK files

Once I'm have the proper steps to reproduce the issue, I can engage CDK Team internally to investigate further.
However, from my experience, they used to respond faster through the github issue. So please keep an eye on the github as well(https://github.com/aws/aws-cdk/issues/6439 )

Looking forward to hear from you, please let me know for any question or clarification on the above.

@mergify mergify bot closed this as completed in #6534 Mar 2, 2020
mergify bot pushed a commit that referenced this issue Mar 2, 2020
#6534)

* fix(rds): setting timezone on DatabaseInstance causes internal failure

Per documentation[1], 'Timezone' property is only supported on Microsoft
SQL Server. Setting this property on a DatabaseInstance with a different
database engine causes deployment to fail (1) with a validation error
for a new instance of `AWS::RDS::DBInstance` and (2) internal failure
when modifying an existing instance of `AWS::RDS::DBInstance`.

[1]:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-database-instance.html#cfn-rds-dbinstance-timezone

fixes #6439

* PR feedback
@jls-tschanzc
Copy link
Author

@robertd
Changing the Instance Class from a BURSTABLE2 to a BURSTABLE3 worked for me.

@nija-at
Thanks for the TZ fix & documentation.

From your posts I assume CF has a bug (or changes it does not support) when changing existing RDS instances.

The issue with changing the engineVersion is therefore related to this, I would assume? As your Pull Request already closed this issue without directly addressing the engineVersion issue. As it stands now, this issue has only been resolved 50% for me.

I can open a related issue for CF in relation to this if requested, just tell me where the best place to create that issue would be.

@robertd
Copy link
Contributor

robertd commented Mar 2, 2020

@jls-tschanzc Which region are you running this in? We're in us-west-2, but we're still getting errors.

edit: I assume based on your timezone parameter that you're in eu-* region.

@jls-tschanzc
Copy link
Author

@robertd
eu-central-1, changing the BURSTABLE version was actually the only change I tried that did not result in an Internal Failure.

@nija-at
Copy link
Contributor

nija-at commented Mar 3, 2020

Ah, sorry this issue got closed @jls-tschanzc . Re-opening to look at the engineVersion change issue.

@nija-at nija-at reopened this Mar 3, 2020
@nija-at nija-at added the needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. label Mar 3, 2020
@nija-at
Copy link
Contributor

nija-at commented Mar 3, 2020

Let's continue tracking this here, since this seems to be affecting multiple properties - #6542

@nija-at nija-at closed this as completed Mar 3, 2020
eladb pushed a commit that referenced this issue Mar 9, 2020
#6534)

* fix(rds): setting timezone on DatabaseInstance causes internal failure

Per documentation[1], 'Timezone' property is only supported on Microsoft
SQL Server. Setting this property on a DatabaseInstance with a different
database engine causes deployment to fail (1) with a validation error
for a new instance of `AWS::RDS::DBInstance` and (2) internal failure
when modifying an existing instance of `AWS::RDS::DBInstance`.

[1]:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-database-instance.html#cfn-rds-dbinstance-timezone

fixes #6439

* PR feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants