Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup CloudWatch alarm for ClamAV notifications (#3895) #5943

Merged
merged 2 commits into from
Mar 1, 2024

Conversation

dsotirho-ucsc
Copy link
Contributor

@dsotirho-ucsc dsotirho-ucsc commented Feb 8, 2024

Connected issues: #3895

Checklist

Author

  • PR is a draft
  • Target branch is develop
  • Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
  • On ZenHub, PR is connected to all issues it (partially) resolves
  • PR description links to connected issues
  • PR title matches1 that of a connected issue or comment in PR explains why they're different
  • PR title references all connected issues
  • For each connected issue, there is at least one commit whose title references that issue

Author (partiality)

  • Added p tag to titles of partial commits
  • Added partial label to PR or this PR completely resolves all connected issues
  • All connected issues are resolved partially or this PR does not have the partial label

1 when the issue title describes a problem, the corresponding PR
title is Fix: followed by the issue title

Author (reindex, API changes)

  • Added r tag to commit title or this PR does not require reindexing
  • Added reindex label to PR or this PR does not require reindexing
  • PR and connected issue are labeled API or this PR does not modify a REST API
  • Added a (A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST API
  • Updated REST API version number in app.py or this PR does not modify a REST API

Author (chains)

  • This PR is blocked by previous PR in the chain or this PR is not chained to another PR
  • Added base label to the blocking PR or this PR is not chained to another PR
  • Added chained label to this PR or this PR is not chained to another PR

Author (upgrading deployments)

  • Documented upgrading of deployments in UPGRADING.rst or this PR does not require upgrading deployments
  • Added u tag to commit title or this PR does not require upgrading deployments
  • Added upgrade label to PR or this PR does not require upgrading deployments

Author (operator tasks)

  • Added checklist items for additional operator tasks or this PR does not require additional tasks

Author (hotfixes)

  • Added F tag to main commit title or this PR does not include permanent fix for a temporary hotfix
  • Reverted the temporary hotfixes for any connected issues or the prod branch has no temporary hotfixes for any connected issues

Author (before every review)

  • Rebased PR branch on develop, squashed old fixups
  • Ran make requirements_update or this PR does not touch requirements*.txt, common.mk, Makefile and Dockerfile
  • Added R tag to commit title or this PR does not touch requirements*.txt
  • Added reqs label to PR or this PR does not touch requirements*.txt
  • make integration_test passes in personal deployment or this PR does not touch functionality that could break the IT

Peer reviewer (after requesting changes)

Uncheck the Author (before every review) checklists.

Peer reviewer (after approval)

  • PR is not a draft
  • Ticket is in Review requested column
  • Requested review from system administrator
  • PR is assigned to system administrator

System administrator (after requesting changes)

Uncheck the before every review checklists. Update the N reviews label.

System administrator (after approval)

  • Actually approved the PR
  • Labeled connected issues as demo or no demo
  • Commented on connected issues about demo expectations or all connected issues are labeled no demo
  • Decided if PR can be labeled no sandbox
  • PR title is appropriate as title of merge commit
  • N reviews label is accurate
  • Moved ticket to Approved column
  • PR is assigned to current operator

Operator (before pushing merge the commit)

  • Checked reindex label and r commit title tag
  • Checked that demo expectations are clear or all connected issues are labeled no demo
  • PR has checklist items for upgrading instructions or PR is not labeled upgrade
  • Squashed PR branch and rebased onto develop
  • Sanity-checked history
  • Pushed PR branch to GitHub
  • Added sandbox label or PR is labeled no sandbox
  • Pushed PR branch to GitLab dev or PR is labeled no sandbox
  • Pushed PR branch to GitLab anvildev or PR is labeled no sandbox
  • Pushed PR branch to GitLab anvilprod or PR is labeled no sandbox
  • Build passes in sandbox deployment or PR is labeled no sandbox
  • Build passes in anvilbox deployment or PR is labeled no sandbox
  • Build passes in hammerbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in sandbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in anvilbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in hammerbox deployment or PR is labeled no sandbox
  • Deleted unreferenced indices in sandbox or this PR does not remove catalogs or otherwise causes unreferenced indices in dev
  • Deleted unreferenced indices in anvilbox or this PR does not remove catalogs or otherwise causes unreferenced indices in anvildev
  • Deleted unreferenced indices in hammerbox or this PR does not remove catalogs or otherwise causes unreferenced indices in anvilprod
  • Started reindex in sandbox or this PR does not require reindexing dev
  • Started reindex in anvilbox or this PR does not require reindexing anvildev
  • Started reindex in hammerbox or this PR does not require reindexing anvilprod
  • Checked for failures in sandbox or this PR does not require reindexing dev
  • Checked for failures in anvilbox or this PR does not require reindexing anvildev
  • Checked for failures in hammerbox or this PR does not require reindexing anvilprod
  • Title of merge commit starts with title from this PR
  • Added PR reference to merge commit title
  • Collected commit title tags in merge commit title but only include p if the PR is labeled partial
  • Moved connected issues to Merged column in ZenHub
  • Pushed merge commit to GitHub

Operator (chain shortening)

  • Changed the target branch of the blocked PR to develop or this PR is not labeled base
  • Removed the chained label from the blocked PR or this PR is not labeled base
  • Removed the blocking relationship from the blocked PR or this PR is not labeled base
  • Removed the base label from this PR or this PR is not labeled base

Operator (after pushing the merge commit)

  • Deployed dev.shared
  • Deployed dev.gitlab
  • Pushed merge commit to GitLab dev or PR is labeled no sandbox
  • Deployed anvildev.shared
  • Deployed anvildev.gitlab
  • Pushed merge commit to GitLab anvildev or PR is labeled no sandbox
  • Deployed anvilprod.shared
  • Deployed anvilprod.gitlab
  • Pushed merge commit to GitLab anvilprod or PR is labeled no sandbox
  • Build passes on GitLab dev1
  • Reviewed build logs for anomalies on GitLab dev1
  • Build passes on GitLab anvildev1
  • Reviewed build logs for anomalies on GitLab anvildev1
  • Build passes on GitLab anvilprod1
  • Reviewed build logs for anomalies on GitLab anvilprod1
  • Deleted PR branch from GitHub
  • Deleted PR branch from GitLab dev
  • Deleted PR branch from GitLab anvildev
  • Deleted PR branch from GitLab anvilprod

1 When pushing the merge commit is skipped due to the PR being
labelled no sandbox, the next build triggered by a PR whose merge commit is
pushed determines this checklist item.

Operator (reindex)

  • Deleted unreferenced indices in dev or this PR does not remove catalogs or otherwise causes unreferenced indices in dev
  • Deleted unreferenced indices in anvildev or this PR does not remove catalogs or otherwise causes unreferenced indices in anvildev
  • Deleted unreferenced indices in anvilprod or this PR does not remove catalogs or otherwise causes unreferenced indices in anvilprod
  • Considered deindexing individual sources in dev or this PR does not merely remove sources from existing catalogs in dev
  • Considered deindexing individual sources in anvildev or this PR does not merely remove sources from existing catalogs in anvildev
  • Considered deindexing individual sources in anvilprod or this PR does not merely remove sources from existing catalogs in anvilprod
  • Considered indexing individual sources in dev or this PR does not merely add sources to existing catalogs in dev
  • Considered indexing individual sources in anvildev or this PR does not merely add sources to existing catalogs in anvildev
  • Considered indexing individual sources in anvilprod or this PR does not merely add sources to existing catalogs in anvilprod
  • Started reindex in dev or this PR does not require reindexing dev
  • Started reindex in anvildev or this PR does not require reindexing anvildev
  • Started reindex in anvilprod or this PR does not require reindexing anvilprod
  • Checked for and triaged indexing failures in dev or this PR does not require reindexing dev
  • Checked for and triaged indexing failures in anvildev or this PR does not require reindexing anvildev
  • Checked for and triaged indexing failures in anvilprod or this PR does not require reindexing anvilprod
  • Emptied fail queues in dev deployment or this PR does not require reindexing dev
  • Emptied fail queues in anvildev deployment or this PR does not require reindexing anvildev
  • Emptied fail queues in anvilprod deployment or this PR does not require reindexing anvilprod

Operator

  • Added CL Items to prod promotion PR:
  • Deployed prod.shared
  • Deployed prod.gitlab
  • PR is assigned to no one

Shorthand for review comments

  • L line is too long
  • W line wrapping is wrong
  • Q bad quotes
  • F other formatting problem

@github-actions github-actions bot added the orange [process] Done by the Azul team label Feb 8, 2024
@dsotirho-ucsc dsotirho-ucsc added the upgrade [process] PR includes commit requiring manual upgrade label Feb 8, 2024
@coveralls
Copy link

coveralls commented Feb 8, 2024

Coverage Status

coverage: 85.201%. remained the same
when pulling 2d2d6c6 on issues/dsotirho-ucsc/3895-clamav-alarm
into f1c2c91 on develop.

Copy link

codecov bot commented Feb 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.18%. Comparing base (f1c2c91) to head (2d2d6c6).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5943   +/-   ##
========================================
  Coverage    85.18%   85.18%           
========================================
  Files          154      154           
  Lines        19896    19896           
========================================
  Hits         16948    16948           
  Misses        2948     2948           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/3895-clamav-alarm branch 2 times, most recently from affed71 to 6639266 Compare February 15, 2024 17:47
@dsotirho-ucsc
Copy link
Contributor Author

Alarm from test on personal deployment

https://groups.google.com/a/ucsc.edu/g/azul-group/c/Ulyl5S4vobQ/m/hcRDxHvHAAAJ

@achave11-ucsc achave11-ucsc force-pushed the develop branch 2 times, most recently from ed4b5e1 to 858f6fb Compare February 15, 2024 21:38
@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/3895-clamav-alarm branch 3 times, most recently from 56d1657 to 877ac0d Compare February 22, 2024 00:22
@dsotirho-ucsc
Copy link
Contributor Author

dsotirho-ucsc commented Feb 22, 2024

Tested alarm on dev.
When the alarm was created it entered an alarm state since the last run of the clamscan service was 3 days prior. The alarm is configured to fire if no docker: clamscan logs can be found in the cwagent log group over the last 18 hours.

After manually starting the clamscan.service on gitlab dev, the alarm entered an OK state.

Screenshot 2024-02-21 at 5 17 46 PM

Screenshot 2024-02-21 at 5 18 10 PM

Screenshot 2024-02-21 at 5 20 16 PM

[ec2-user@ip-172-71-0-215 ~]$ sudo systemctl status clamscan.service
● clamscan.service - ClamAV malware scan of entire file system
   Loaded: loaded (/etc/systemd/system/clamscan.service; static; vendor preset: disabled)
   Active: inactive (dead) since Sun 2024-02-18 21:38:33 UTC; 3 days ago
  Process: 30969 ExecStart=/usr/bin/docker run --name clamscan --rm --volume /var/run/docker.sock:/var/run/docker.sock --volume /:/scan:ro --volume /mnt/gitlab/clamav:/var/lib/clamav:rw 122796619775.dkr.ecr.us-east-1.amazonaws.com/docker.io/clamav/clamav:1.2.1-27 /bin/sh -c freshclam && echo freshclam succeeded || (echo freshclam failed; false) && clamscan --recursive --infected --allmatch=yes --exclude-dir=^/scan/var/lib/docker/overlay2/.*/merged/sys --exclude-dir=^/scan/var/lib/docker/overlay2/.*/merged/proc --exclude-dir=^/scan/var/lib/docker/overlay2/.*/merged/dev --exclude-dir=^/scan/sys --exclude-dir=^/scan/proc --exclude-dir=^/scan/dev /scan && echo clamscan succeeded || (echo clamscan failed; false) (code=exited, status=0/SUCCESS)
  Process: 30790 ExecStartPre=/usr/bin/docker pull 122796619775.dkr.ecr.us-east-1.amazonaws.com/docker.io/clamav/clamav:1.2.1-27 (code=exited, status=0/SUCCESS)
  Process: 30772 ExecStartPre=/usr/bin/docker rm clamscan (code=exited, status=1/FAILURE)
  Process: 30739 ExecStartPre=/usr/bin/docker stop clamscan (code=exited, status=1/FAILURE)
 Main PID: 30969 (code=exited, status=0/SUCCESS)

Feb 18 08:00:01 ip-172-71-0-215.ec2.internal systemd[1]: Starting ClamAV malware scan of entire file system...
Feb 18 08:00:11 ip-172-71-0-215.ec2.internal systemd[1]: Started ClamAV malware scan of entire file system.
[ec2-user@ip-172-71-0-215 ~]$
[ec2-user@ip-172-71-0-215 ~]$ sudo systemctl start clamscan.service
[ec2-user@ip-172-71-0-215 ~]$
[ec2-user@ip-172-71-0-215 ~]$ sudo systemctl status clamscan.service
● clamscan.service - ClamAV malware scan of entire file system
   Loaded: loaded (/etc/systemd/system/clamscan.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2024-02-22 01:06:34 UTC; 2s ago
  Process: 20801 ExecStartPre=/usr/bin/docker pull 122796619775.dkr.ecr.us-east-1.amazonaws.com/docker.io/clamav/clamav:1.2.1-27 (code=exited, status=0/SUCCESS)
  Process: 20792 ExecStartPre=/usr/bin/docker rm clamscan (code=exited, status=1/FAILURE)
  Process: 20763 ExecStartPre=/usr/bin/docker stop clamscan (code=exited, status=1/FAILURE)
 Main PID: 20822 (docker)
    Tasks: 9
   Memory: 24.8M
   CGroup: /system.slice/clamscan.service
           └─20822 /usr/bin/docker run --name clamscan --rm --volume /var/run/docker.sock:/var/run/docker.sock --volume /:/scan:ro --volume /mnt/gitlab/clamav:/var/lib/clamav:rw 1227966197...

@dsotirho-ucsc
Copy link
Contributor Author

5943-IT_2024-02-21.txt

Copy link
Contributor

@nadove-ucsc nadove-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor suggestion

}
} for resource_name, period in [
('trail_logs', 10 * 60),
('clamscan', 18 * 60 * 60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a comment explaining the choice of evaluation period.

nadove-ucsc
nadove-ucsc previously approved these changes Feb 22, 2024
@nadove-ucsc nadove-ucsc marked this pull request as ready for review February 22, 2024 19:31
Comment on lines 528 to 546
'clamscan': {
'name': config.qualified_resource_name('clamscan', suffix='.filter'),
# Patterns that include non-alphanumeric characters must be
# wrapped in double quotation marks ("")
'pattern': '"docker: clamscan"',
'log_group_name': '/aws/cwagent/azul-gitlab',
'metric_transformation': {
'name': config.qualified_resource_name('clamscan'),
'namespace': 'LogMetrics',
'value': 1,
'default_value': 0,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have lost the thread in the years since the ticket was filed:

Does this PR exhibit the behavior I clarified on the ticket just now?

@hannes-ucsc hannes-ucsc added the 0 reviews [process] Lead didn't request any changes label Feb 23, 2024
@hannes-ucsc hannes-ucsc removed their assignment Feb 23, 2024
@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/3895-clamav-alarm branch from 14261c0 to 5f04a9f Compare February 27, 2024 21:43
@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/3895-clamav-alarm branch from 5f04a9f to 4d3d54f Compare February 28, 2024 17:30
@dsotirho-ucsc
Copy link
Contributor Author

Tested alarms on dev.

Freshclam alarm

  • Upon creation the alarm entered an "alarm" state since there were no "freshclam succeeded" logs within the last 18 hours.
  • After manually starting the clamscan service on GitLab dev, the alarm entered an "OK" state

Screenshot 2024-02-28 at 10 04 50 AM

Clamscan alarm

Test not performed since the configuration is similar to the freshclam alarm, and it would take 10+ hours for the "clamscan succeeded" log message to appear.

Clam_fail alarm

This alarm changes from an "OK" state to "alarm" if a "clamscan failed" or "freshclam failed" message is logged. Since this event is not easily reproducible, instead the filter pattern syntax used by this alarm was verified using the AWS Console.

The actual filter pattern used by this alarm is ?"clamscan failed" ?"freshclam failed"

Screenshot 2024-02-28 at 9 49 10 AM

@dsotirho-ucsc
Copy link
Contributor Author

5943_IT_2024-02-28.txt

@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/3895-clamav-alarm branch from 4d3d54f to 2d2d6c6 Compare March 1, 2024 04:51
@dsotirho-ucsc dsotirho-ucsc added deploy:gitlab [process] PR requires deploying `gitlab` component deploy:shared [process] PR requires deploying `shared` component sandbox [process] Resolution is being verified in sandbox deployment labels Mar 1, 2024
@dsotirho-ucsc dsotirho-ucsc merged commit 38f32a2 into develop Mar 1, 2024
12 checks passed
@dsotirho-ucsc dsotirho-ucsc deleted the issues/dsotirho-ucsc/3895-clamav-alarm branch March 1, 2024 16:40
@dsotirho-ucsc dsotirho-ucsc removed their assignment Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 reviews [process] Lead didn't request any changes deploy:gitlab [process] PR requires deploying `gitlab` component deploy:shared [process] PR requires deploying `shared` component orange [process] Done by the Azul team sandbox [process] Resolution is being verified in sandbox deployment upgrade [process] PR includes commit requiring manual upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants