Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Watch events execution mode with target RunInstances #2321

Closed
davidclin opened this issue May 2, 2018 · 7 comments
Closed

Cloud Watch events execution mode with target RunInstances #2321

davidclin opened this issue May 2, 2018 · 7 comments

Comments

@davidclin
Copy link

This is a follow on of #2257

I'm unable to get the following policy to stop newly launched EC2 instances that are assigned to subnets with key/value pair of Location:Internet.

The policy is executed using the Cloud Watch events mode :

 policies:
  - name: subnet-audit
    resource: ec2
    mode:
      type: cloudtrail
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - source: ec2.amazonaws.com
          event: RunInstances
          ids: "responseElements.instanceSet.items[].instanceId"
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - stop

Execution run

(custodian) $ custodian run -s . public-subnet-instance-audit-lambda.yml
2018-05-02 21:19:52,473: custodian.policy:INFO Provisioning policy lambda subnet-audit
2018-05-02 21:19:52,704: custodian.lambda:INFO Publishing custodian policy lambda function custodian-subnet-audit

The lambda is successfully created and viewable from the AWS management console.

Need tips/guidance on how to troubleshoot the Lambda. This is all new to me.

@davidclin
Copy link
Author

Also tried having Lambda receive EC2 instance state event without success:

policies:
  - name: subnet-audit
    resource: ec2
    mode:
      type: ec2-instance-state
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - pending
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - stop

I am launching the test EC2 instance from the management console and making sure I'm selecting a subnet with the tag Location:Internet.

@davidclin
Copy link
Author

I was able to get the Lambda working by:

(1) using ec2-instance-state
(2) adding EC2 permissions for the role (namely, to stop instances)
(3) changing the events from 'pending' to 'running'

It's not entirely clear to me why the lambda will not trigger when a new EC2 instance is launched and is in 'pending' state.

@kapilt
Copy link
Collaborator

kapilt commented May 3, 2018

instances in pending state can't be stopped, only terminated. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html

for more details.

@kapilt
Copy link
Collaborator

kapilt commented May 3, 2018

There should have been a note in the policy lambda logs about having to implicitly filter those out for the stop action as well.

@davidclin
Copy link
Author

davidclin commented May 3, 2018

I modified the action from 'stop' to 'terminate' and it does exactly as explained in the document. 👍

My final working policy now looks like this:

policies:
  - name: subnet-audit
    resource: ec2
    description: |
      This policy runs in the ec2-instance-state mode where the Lambda receives EC2 instance state events
      and is triggered when an ec2 instance is in 'pending' state. The Lambda will then take the terminate 
      action based on the attributes of the network ec2 instances are attached. For example, subnets
      with tag 'Location' and value that matches 'Internet' will be terminated.  Note, instances in 'pending'
      state cannot be stopped. 
    mode:
      type: ec2-instance-state
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - pending
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - terminate

Regarding the note in the policy lambda logs about having to implicitly filter those out for the stop action, I'm having trouble locating them.

I followed the instructions provided in https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html and get the following message when I work my way to the Monitoring tab and click "Jump to logs" for the Errors widget:

Log group not found
The log group /aws/lambda/custodian-subnet-audit could not be found. Check if it was correctly created and retry.

Is there anything I need to add to my policy to create the log group mentioned above? Or maybe an additional permission for the Lambda role to write to CloudWatch? Maybe I answered my own question. :) I'll go try that...

Appreciate the pointers and help!

@davidclin
Copy link
Author

davidclin commented May 3, 2018

For the benefit of fellow Cloud Custodian users who are getting their feet wet and following this thread as part of the troubleshooting process, I was able to view the Lambda error logs in CloudWatch by adding permissions to the Lambda role.

Namely, I added CloudWatch Log permissions for

"logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "ec2:*",
                "logs:CreateLogGroup",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

While I'm still unable to find anything in the logs related to having to implicitly filter out the same for the stop action, I'm at least able to see the logs of the entire workflow and see what the lambda is doing which is still helpful.

Snip from lambda logs:

[DEBUG]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	metric:ResourceCount Count:1 policy:subnet-audit restype:ec2 scope:policy
[INFO]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	**Invoking actions** []
[INFO]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	policy: subnet-audit **invoking action: stop** resources: 1
[INFO]	2018-05-03T19:44:52.128Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	**Stop** 0 of 1 instances

@davidclin
Copy link
Author

I think we can close this out. I have a working policy now. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants