-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudwatch Log Group created with options does not get auto destroyed #920
Comments
We're seeing this issue too. It seems that terraform does correctly destroy the log group, but that logs are still shipped to CloudWatch for some time after AWS say an EKS cluster has been deleted, causing CloudWatch to recreate the log group. |
We're doing We've seen it take ~3 minutes to stop being recreated.
|
This is happening because EKS Cluster gets destroyed after Terraform delete the Cloudwatch Log Group. The AmazonEKSServicePolicy IAM policy (that is assigned to EKS Cluster role by default within this module) has permissions to CreateLogGroup and anything else needed to continue to logging correctly. When the Terraform destroys the Cloudwatch Log Group, the EKS Cluster that is running create it again. Then, when you run Terraform Apply again, the Cloudwatch Log Group doesn't exist in your state anymore (because the Terraform actually destroyed it) and the Terraform doesn't know this resource created outside him. |
Could the Terraform config be changed to make sure that the EKS cluster has been destroyed before the CloudWatch log group is destroyed? |
Humm are you sure about that @maiconbaumx ? There is an explicit dependency between the cluster resource and the cloudwatch log group. Technically, Terraform should destroy |
@barryib yep... that seems to be the case as in terraform-aws-modules/terraform-aws-vpc#435 and hashicorp/terraform#14750 too |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Is there any news on this? |
I am facing the same issue. Any updates on this, expecting auto deletion of the log group once the VPC is destroyed. |
Fixed this by adding a depends_on to the cloudwatch log resource |
@ulfox can please you share your code snippet. |
Sure
|
We already have that depends_on in this module. See https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v13.1.0/cluster.tf#L47 |
apologies, I missed the module part. I thought the comment was directly for the resource. I switched my code to use the module, cheers |
FWIW I think that this issue is not related to terraform or this module, but it's more about how AWS buffers logs and flush them in cloudwatch log group. See also : |
I was wondering if we shouldn't deny the @max-rocket-internet @antonbabenko Any thoughts ? I think this also true for the VPC module log group for flow logs. |
Yes, I think you are right. If I understand correctly now the VPC flow logs will create such There is related discussion (hashicorp/terraform-provider-aws#902) and a note in terraform-aws-provider/ROADMAP.md. |
But will reducing the resource from Unless you wan to say that we should deny |
Probably it will be working fine if permissions are scoped to the more precise resources instead of |
one thing that may be considered here is to allow the creation of the log group externally. For my use case I have a cluster that I want to destroy every night and recreate every morning (development environment) however keeping the logs from previous days could be useful. Perhaps using 0.13's module-level depends_on users could make sure the log group is the top of the food chain? |
@ostmike Yes, we can add an option to let users create and managed their own I'll happy to review any PR for that. But for sure, that would make any difference in a busy EKS environment. Because, EKS will recreate the cloudwatch log group just after Terraform destroy it, if it has logs left. |
That's still an issue. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Wowowow, hol’up Mr. Bot, don’t stale, this is still an issue ! So yes as @barryib said the solution, same as for the VPC module which had the same problem, is to deny the logs:CreateLogGroup permission in the service role: it is not needed since we create the log group ourselves with Terraform, and it is causing trouble. See the implementation for the VPC module: terraform-aws-modules/terraform-aws-vpc#550. @antonbabenko you looked at it the other day so you may remember the details. Now the situation here is a tad more complex because we don’t create the policy ourselves, we use the AWS provided one (EDIT: note for later, this is where the magic happens https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/cluster.tf#L119 and the three resources below that) |
we tested it with #1594 and deny the create log group - we managed to get the auto destroy working please have a look |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@haarchri Oh yeah ! I knew something was possible, I didn’t know what, and this is it ! Overlaying a "Deny" policy over the existing Amazon policy so as not to diverge from it, that’s very good. And you did all the doc as well. I’ll leave a comment on your PR but I’m not maintainer so I can’t approve it officially. @ Mr stale bot: not stale :) |
This issue has been resolved in version 17.24.0 🎉 |
Having the same symptoms when destroying and recreating the cluster on |
This was tested and deemed no longer required and therefore removed in v18 - I test this module quite often throughout the week for various PRs and do not see any issues with cloudwatch logs not deleting. please feel free to open a new issue with reproduction steps if necessary |
@bryantbiggs you cannot replicate this because you dont have a workload over cluster. This is caused by general latency of cloudwatch log processing under huge load. EKS plane producing logs and is destroyed, cloudwatch group is also destroyed and then cloudwatch internal AWS queue recreates log group (when it was arlelady removed by terraform) |
If that's the case, it's still not the concern of the module. It sounds like users need to delete/stop workloads before destroying the cluster |
I dont think it make a sense to put such requirement for destroy operation. Other case is that AWS doesnt guarantee how fast logs will be written to cloudwatch group. So if I enable audit on cluster then even when I dont have a logs there is a lot of API requests logged to controller. So in result again EKS plane is removed, logs are still flowing but terraform removed cloudwatch so it gets recreated. |
@bryantbiggs why then the overlay deny was removed with version V18 ? when the only option is now to say we need delete/stop workloads before - because we delivered the fix that it fits |
I believe I answered this above. Users can attach additional policies as needed so feel free to do so with your configuration if that is what you need |
I hit this recently on a simple eks cluster with virtually no workload running when I |
@timblaktu, why is the explicit deny "hack" not working for you? |
I don't know. We're using v18.x so we should have the deny hack (you mean this, right?) in place, since it was introduced in 17.24.0. In my case, this is not easily reproducible, and I'm imagining it happens in extraordinary circumstances, like maybe iff a terraform process (apply or destroy) aborted ungracefully or something.. |
@bryantbiggs I upgraded our module to 18.17.1 and will start a tight loop creating/destroying our infra. We don't run much workloads either yet and that might be why this is so rare. Ran into it again today, and suspect it's in cases where we destroy immediately after create finishes.. @bryantbiggs with module v18.17.1, (and tf v1.1.7) we're still seeing the issue with the CloudWatch log group not getting destroyed. It's quite reproducible with my script, and now appears to happen on the very second iteration, so: create-eks I added the following workaround following the destroy operation in my Makefile:
and now, still using module v18.17.1 (and tf v1.1.7), I'm STILL SEEING THE ERROR when running my "create/destroy in a loop" script, even in cases where I see in my test output that it detected the log group still exists and deleted it. Is there a "grace period" after CloudWatch log groups are deleted that they cannot be re-created? (kind of like how secrets manager secrets have a recovery_window?) EDIT: when looking at the log groups in the AWS Console, I see that the problem ones have their "Retention" setting set to "Never Expire", while others are set to "3 months". The latter comes from the fact that the cloudwatch_log_group_retention_in_days input defaults to 90 days, but I have no idea how/why the others are set to "Never Expire". |
Still facing this issue using terraform v1.2.9 and eks module "18.26.6". |
@bryantbiggs just to confirm i'm also seeing this issue on the latest version. Could we get the code of #1594 re-added again ? |
Lines 266 to 287 in 7f90184
|
perhaps the ARN for the CloudWatch log group is not sufficient for this type of denial policy. This has been changed back to what the original PR had with a wildcard |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I have issues
I'm submitting a...
What is the current behavior?
On Creating with the following options
cluster gets created nicely but on destroy it throws
**Error: Creating CloudWatch Log Group failed: ResourceAlreadyExistsException: The specified log group already exists: The CloudWatch Log Group '/aws/eks/test-eks/cluster' already exists.
on .terraform/modules/eks-cluster/terraform-aws-eks-12.1.0/cluster.tf line 1, in resource "aws_cloudwatch_log_group" "this":
1: resource "aws_cloudwatch_log_group" "this" {**
If this is a bug, how to reproduce? Please include a code sample if relevant.
Create with the above options and destroy
What's the expected behavior?
The Destroy should take care of the Cloudwatch Log Group created with the options.
Are you able to fix this problem and submit a PR? Link here if you have already.
NA
Environment details
Terraform v0.12.24
provider.aws v2.66.0
provider.kubernetes v1.11.3
provider.local v1.4.0
provider.null v2.1.2
provider.random v2.2.1
provider.template v2.1.2
Any other relevant info
The text was updated successfully, but these errors were encountered: