This repository contains the code to build a pipeline that scans objects uploaded to GCS for malware, moving the documents to a clean or quarantined bucket depending on the malware scan status.
It illustrates how to use Cloud Run and Eventarc to build such a pipeline.
Use the tutorial to understand how to configure your Google Cloud Platform project to use Cloud Run and Eventarc.
The tutorial above uses a configuration file config.json
built into the Docker
container for the configuration of the unscanned, clean, quarantined and CVD
updater cloud storage buckets.
Environment variables can be used to vary the deployment in 2 ways:
Any environment variables specified using shell-format within the config.json
file will be expanded using
envsubst
.
An alternative to building the configuration file into the container is to use environmental variables to contain the configuration of the service, so that multiple deployments can use the same container, and configuration updates do not need a container rebuild.
This can be done by setting the environmental variable CONFIG_JSON
containing
the JSON configuration, which will override any config in the config.json
file.
If using the gcloud run deploy
command line, this environment variable must be
set using the
--env-vars-file
argument, specifying a YAML file containing the environment variable definitions
(This is because the commas in JSON would break the parsing of --set-env-vars
)
Take care when embedding JSON in YAML - it is recommended to use the
Literal Block Scalar style using |
, as this
preserves newlines and quotes
For example, the CONFIG_JSON
environment variable could be defined in a file
config-env.yaml
as follows:
CONFIG_JSON: |
{
"buckets": [
{
"unscanned": "unscanned-bucket-name",
"clean": "clean-bucket-name",
"quarantined": "quarantined-bucket-name"
}
],
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
"fileExclusionPatterns": [],
ignoreZeroLengthFiles: false
}
An example commandline using this file to specify the environment:
gcloud beta run deploy "${SERVICE_NAME}" \
--source . \
--region "${REGION}" \
--no-allow-unauthenticated \
--memory 4Gi \
--cpu 1 \
--concurrency 20 \
--min-instances 1 \
--max-instances 5 \
--no-cpu-throttling \
--cpu-boost \
--service-account="${SERVICE_ACCOUNT}" \
--env-vars-file=config-env.yaml
If you are using Terraform to deploy, then the equivalent way to specify the
environment variable using the
google_cloud_run_v2_service
resource is by using the
env
block and
jsonencode:
resource "google_cloud_run_v2_service" "malware-scanner" {
name = "malware-scanner"
// other service parameters...
template {
// other template parameters...
containers {
// other container parameters...
env {
name = "CONFIG_JSON"
value = jsonencode({
buckets = [
{
unscanned = "unscanned-bucket-name",
clean = "clean-bucket-name",
quarantined = "quarantined-bucket-name"
}
]
ClamCvdMirrorBucket = "cvd-mirror-bucket-name",
fileExclusionPatterns = [],
ignoreZeroLengthFiles = false
})
}
}
}
}
The fileExclusionPatterns
array in the config file can be used to ignore any
uploaded files matching a
Regular Expression.
This can be used for example if you have an upload system that creates temporary files, then renames them once the files are fully uploaded.
The elements in the fileExclusionPatterns
array can either be simple strings,
for example:
"fileExclusionPatterns": [
"\\.tmp$",
"^ignore_me.*\\.txt$"
]
or they can be an array of 2 string values, allowing regular expression flags to
be specified, for example "i"
for case-insensitive matches:
"fileExclusionPatterns": [
[ "\\.tmp$", "i" ],
[ "tempfile.*.upload$", "i" ]
]
Files matching these patterns will be ignored by the scanner, and left in the
unscanned
bucket, and an ignored-files
counter incremented.
Helpful tools for regular expressions include the Regular Expression Cheatsheet, and the Regex101 playground (ensure ECMAScript flavor is selected).
Note that when adding regular expressions into the config file, care must be
taken with \
and "
characters -- any of these characters in the regular
expression must be escaped with another \
.
See CHANGELOG.md
In Version 3.x, the metrics reporting was changed to OpenTelemetry which uses a different naming convention for metrics, so the metric names have changed from:
custom.googleapis.com/opencensus/malware-scanning/METRIC-NAME
to
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/METRIC-NAME
Any dashboards or alerts using these metrics must be updated
Version 2 has a different way of handling ClamAV updates to avoid issues with the ClamAV content distribution network.
See upgrade_from_v1.md for upgrading instructions.
Copyright 2022 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.